Add hardware-aware canonicalization

jtuyls commented 2 weeks ago

Canonicalization of the offsets/strides/sizes in doubly-strided operations, like DMA ops, can lead to overflow of the number of available bits in the hardware buffer descriptor fields. This PR adds logic to the AMDAIECanonicalizeDoublyStridedOpPass to not canonicalize if it would lead to such an overflow.

Note that there is more work to be done on avoiding overflow of available hardware bits and making it more robust, which is not addressed in this PR:

Using aie-rt's transaction data structure and APIs to generate the control code transaction instead of the manual transaction creation we're currently doing: https://github.com/nod-ai/iree-amd-aie/blob/20867183c610ec870344d074c4be78e0aaac1515/compiler/plugins/target/AMD-AIE/iree-amd-aie/Transforms/AMDAIEControlCodeToTransaction.cpp#L33. As aie-rt has a lot of checks and will throw errors if hardware field bits are exceeded, this will make the flow way more robust, in the sense that errors will be thrown instead of hard-to-debug silent overflows, leading to hangs or numerical errors.
A pass that can decanonicalize offset/strides/sizes access patterns, for example if available bits are already exceeded before canonicalization, for example when strides/sizes are derived from the pack operations. Note that this should typically not happen, but in general DMA ops could be create that exceed hardware limits and a decanonicalization transformation would let the flow handle more cases.

newling commented 2 weeks ago

Can you please clarify these points in the summary:

Moving to transaction based control code generation, as aie-rt will throw errors if hardware field bits are exceeded.

Throwing errors if hardware resources are exceeded sounds like a good thing, so I'm interpreting this as: moving to transaction based control code generation means moving to using aie-rt. Is that correct?

A pass that can decanonicalize offset/strides/sizes access patterns, for example if available bits are already before canonicalization.

Is this about undoing preexisting overflow?

jtuyls commented 2 weeks ago

Can you please clarify these points in the summary:

Moving to transaction based control code generation, as aie-rt will throw errors if hardware field bits are exceeded.

Throwing errors if hardware resources are exceeded sounds like a good thing, so I'm interpreting this as: moving to transaction based control code generation means moving to using aie-rt. Is that correct?

Yes, using aie-rt APIs and the transaction data structure to generate the control code transaction instead of the manual way we're doing currently, which doesn't perform (m)any checks: https://github.com/nod-ai/iree-amd-aie/blob/20867183c610ec870344d074c4be78e0aaac1515/compiler/plugins/target/AMD-AIE/iree-amd-aie/Transforms/AMDAIEControlCodeToTransaction.cpp#L33

A pass that can decanonicalize offset/strides/sizes access patterns, for example if available bits are already before canonicalization.

Is this about undoing preexisting overflow?

Yes, if the initial strides/sizes generated from the pack would overflow for example. Typically, this shouldn't happen, but in principal it could.

jtuyls commented 2 weeks ago

@newling I addressed the comments, could you check again?

nod-ai / iree-amd-aie

Add hardware-aware canonicalization #874