Closed jtuyls closed 2 weeks ago
Can you please clarify these points in the summary:
Moving to transaction based control code generation, as aie-rt will throw errors if hardware field bits are exceeded.
Throwing errors if hardware resources are exceeded sounds like a good thing, so I'm interpreting this as: moving to transaction based control code generation means moving to using aie-rt. Is that correct?
A pass that can decanonicalize offset/strides/sizes access patterns, for example if available bits are already before canonicalization.
Is this about undoing preexisting overflow?
Can you please clarify these points in the summary:
Moving to transaction based control code generation, as aie-rt will throw errors if hardware field bits are exceeded.
Throwing errors if hardware resources are exceeded sounds like a good thing, so I'm interpreting this as: moving to transaction based control code generation means moving to using aie-rt. Is that correct?
Yes, using aie-rt APIs and the transaction data structure to generate the control code transaction instead of the manual way we're doing currently, which doesn't perform (m)any checks: https://github.com/nod-ai/iree-amd-aie/blob/20867183c610ec870344d074c4be78e0aaac1515/compiler/plugins/target/AMD-AIE/iree-amd-aie/Transforms/AMDAIEControlCodeToTransaction.cpp#L33
A pass that can decanonicalize offset/strides/sizes access patterns, for example if available bits are already before canonicalization.
Is this about undoing preexisting overflow?
Yes, if the initial strides/sizes generated from the pack would overflow for example. Typically, this shouldn't happen, but in principal it could.
@newling I addressed the comments, could you check again?
Canonicalization of the offsets/strides/sizes in doubly-strided operations, like DMA ops, can lead to overflow of the number of available bits in the hardware buffer descriptor fields. This PR adds logic to the
AMDAIECanonicalizeDoublyStridedOpPass
to not canonicalize if it would lead to such an overflow.Note that there is more work to be done on avoiding overflow of available hardware bits and making it more robust, which is not addressed in this PR:
pack
operations. Note that this should typically not happen, but in general DMA ops could be create that exceed hardware limits and a decanonicalization transformation would let the flow handle more cases.