Closed Abhishek-Varma closed 3 months ago
I've made some update to the consumer fusion pass in IREE-AMD-AIE (point 2 in the above PR's description) : https://github.com/nod-ai/iree-amd-aie/pull/351
I cherry-picked that (along with upstream PR) on top of this WIP PR raised and here is the latest IR dump : https://gist.github.com/Abhishek-Varma/c63df3bf5c759adeb01626303341d9ed - it contains the IRs for both "Matmul" as well as "Matmul + Elementwise".
In the above gist you'd see that the last iteration for pure "Matmul" dispatch is now also peeled.
All I did for this was switch off : https://github.com/nod-ai/iree-amd-aie/blob/9d8d51ec03d84e742a3875f1baf3a0bed8ea4ba0/compiler/plugins/target/AMD-AIE/iree-amd-aie/Transforms/AMDAIEPeelForLoop.cpp#L84-L92 and instead just add peelingType = PeelingType::FirstLast;
(@yzhang93 - can you confirm if that looks okay to you? And if there are some implications of this that's perhaps getting overlooked?)
CC: @MaheshRavishankar @jtuyls @yzhang93
3. The current peeling pass is basing decision on whether we have "Matmul" or "Matmul+Elemwise" to decide whether to peel the final iteration or not -> but WE SHOULD PEEL IRRESPECTIVE. This hopefully is a trivial change (CC: @yzhang93 ) at least w.r.t the individual pass is concerned.
Yes, it's a trivial change. Just want to confirm the objectfifo pipeline requires the last iteration to be peeled, right?
and instead just add
peelingType = PeelingType::FirstLast;
(@yzhang93 - can you confirm if that looks okay to you? And if there are some implications of this that's perhaps getting overlooked?) CC: @MaheshRavishankar @jtuyls @yzhang93
@Abhishek-Varma I already have a branch does this. It works on top of your in flight PR and peels the last iteration and fuses the unpack into the loop. Please refer to this https://github.com/yzhang93/iree-amd-aie/tree/abhishek_fusion_branch
Just want to confirm the objectfifo pipeline requires the last iteration to be peeled, right?
Yes, that's correct. I'm following this IR log : Objectfifo IR log.
and instead just add
peelingType = PeelingType::FirstLast;
(@yzhang93 - can you confirm if that looks okay to you? And if there are some implications of this that's perhaps getting overlooked?) CC: @MaheshRavishankar @jtuyls @yzhang93@Abhishek-Varma I already have a branch does this. It works on top of your in flight PR and peels the last iteration and fuses the unpack into the loop. Please refer to this https://github.com/yzhang93/iree-amd-aie/tree/abhishek_fusion_branch
So, I had a look at the branch - of the 3 commits I believe only the last iteration peeling commit will be required rest seems either already checked into ToM or outdated.
I see that you've explicitly mentioned the last iterations peel, whereas better would be to maybe get rid of the entire snippet within the pass itself which checks whether it is "Matmul + Elemwise"?
Just want to confirm the objectfifo pipeline requires the last iteration to be peeled, right?
Yes, that's correct. I'm following this IR log : Objectfifo IR log.
It doesn't really 'require' that the last iteration is peeled, but it can handle it, so it would be simplest to always peel for now.
The reason I peeled the last iteration was to get around the issue of the pack/unpack back-and-forth between L1 and L2, but I think there's a way around that now without peeling the last iteration as well?
It doesn't really 'require' that the last iteration is peeled, but it can handle it, so it would be simplest to always peel for now.
The reason I peeled the last iteration was to get around the issue of the pack/unpack back-and-forth between L1 and L2, but I think there's a way around that now without peeling the last iteration as well?
I think if the objectfifo backend handles the last iteration peeled, we should, for now, stick with that.
The reason I'm suggesting this is because the peeling of the last iteration also helps in fusing the consumer (tensor.unpack
+ Elementwise).
Else if we choose to go ahead without peeling :-
scf
loop consumer fusion.
ORtensor.unpack
(CC: @MaheshRavishankar )I think if the objectfifo backend handles the last iteration peeled, we should, for now, stick with that.
Yeah, I agree.
Have raised a PR to add a fix to --iree-amdaie-unroll-and-distribute-workgroup
pass : https://github.com/nod-ai/iree-amd-aie/pull/374
This helps fix a crash when trying to use the C++ pipeline -> the amdaie.tile
op is hoisted out to the beginning of the amdaie.workgroup
, but while trying to do so we're creating an invalid IR since the operand of the amdaie.tile
is tied to the loop IV. This itself turned out to be a red-herring since the current implementation of the pass adds a strict check for normalised loop bounds, as a result of which it never unrolls the SCF loop and spuriously hoists the amdaie.tile
op.
CC: @MaheshRavishankar @jtuyls @yzhang93
I think this PR can be closed now that the objectfifo pipeline has been merged with: https://github.com/nod-ai/iree-amd-aie/pull/473? Feel free to reopen if not the case.
This is a WIP PR which will entail the C++ pipeline for Pack-Peel backend lowering to AIE along with the e2e tests.
Sub-action items missing here for a complete e2e flow (in-order of their need) :-