nod-ai / iree-amd-aie

IREE plugin repository for the AMD AIE accelerator
Apache License 2.0
64 stars 29 forks source link

[WIP] Add C++ pipeline for Pack-Peel backend #368

Closed Abhishek-Varma closed 3 months ago

Abhishek-Varma commented 4 months ago

This is a WIP PR which will entail the C++ pipeline for Pack-Peel backend lowering to AIE along with the e2e tests.

Sub-action items missing here for a complete e2e flow (in-order of their need) :-

  1. Upstream in-flight consumer fusion utility : #88712
  2. An in-flight PR in iree-amd-aie: https://github.com/nod-ai/iree-amd-aie/pull/351 (to be merged after point 1)
  3. The current peeling pass is basing decision on whether we have "Matmul" or "Matmul+Elemwise" to decide whether to peel the final iteration or not -> but WE SHOULD PEEL IRRESPECTIVE. This hopefully is a trivial change (CC: @yzhang93 ) at least w.r.t the individual pass is concerned.
Abhishek-Varma commented 4 months ago

I've made some update to the consumer fusion pass in IREE-AMD-AIE (point 2 in the above PR's description) : https://github.com/nod-ai/iree-amd-aie/pull/351

I cherry-picked that (along with upstream PR) on top of this WIP PR raised and here is the latest IR dump : https://gist.github.com/Abhishek-Varma/c63df3bf5c759adeb01626303341d9ed - it contains the IRs for both "Matmul" as well as "Matmul + Elementwise".

In the above gist you'd see that the last iteration for pure "Matmul" dispatch is now also peeled. All I did for this was switch off : https://github.com/nod-ai/iree-amd-aie/blob/9d8d51ec03d84e742a3875f1baf3a0bed8ea4ba0/compiler/plugins/target/AMD-AIE/iree-amd-aie/Transforms/AMDAIEPeelForLoop.cpp#L84-L92 and instead just add peelingType = PeelingType::FirstLast; (@yzhang93 - can you confirm if that looks okay to you? And if there are some implications of this that's perhaps getting overlooked?)

CC: @MaheshRavishankar @jtuyls @yzhang93

yzhang93 commented 4 months ago

3. The current peeling pass is basing decision on whether we have "Matmul" or "Matmul+Elemwise" to decide whether to peel the final iteration or not -> but WE SHOULD PEEL IRRESPECTIVE. This hopefully is a trivial change (CC: @yzhang93 ) at least w.r.t the individual pass is concerned.

Yes, it's a trivial change. Just want to confirm the objectfifo pipeline requires the last iteration to be peeled, right?

yzhang93 commented 4 months ago

and instead just add peelingType = PeelingType::FirstLast; (@yzhang93 - can you confirm if that looks okay to you? And if there are some implications of this that's perhaps getting overlooked?) CC: @MaheshRavishankar @jtuyls @yzhang93

@Abhishek-Varma I already have a branch does this. It works on top of your in flight PR and peels the last iteration and fuses the unpack into the loop. Please refer to this https://github.com/yzhang93/iree-amd-aie/tree/abhishek_fusion_branch

Abhishek-Varma commented 4 months ago

Just want to confirm the objectfifo pipeline requires the last iteration to be peeled, right?

Yes, that's correct. I'm following this IR log : Objectfifo IR log.

and instead just add peelingType = PeelingType::FirstLast; (@yzhang93 - can you confirm if that looks okay to you? And if there are some implications of this that's perhaps getting overlooked?) CC: @MaheshRavishankar @jtuyls @yzhang93

@Abhishek-Varma I already have a branch does this. It works on top of your in flight PR and peels the last iteration and fuses the unpack into the loop. Please refer to this https://github.com/yzhang93/iree-amd-aie/tree/abhishek_fusion_branch

So, I had a look at the branch - of the 3 commits I believe only the last iteration peeling commit will be required rest seems either already checked into ToM or outdated.

I see that you've explicitly mentioned the last iterations peel, whereas better would be to maybe get rid of the entire snippet within the pass itself which checks whether it is "Matmul + Elemwise"?

jtuyls commented 4 months ago

Just want to confirm the objectfifo pipeline requires the last iteration to be peeled, right?

Yes, that's correct. I'm following this IR log : Objectfifo IR log.

It doesn't really 'require' that the last iteration is peeled, but it can handle it, so it would be simplest to always peel for now.

The reason I peeled the last iteration was to get around the issue of the pack/unpack back-and-forth between L1 and L2, but I think there's a way around that now without peeling the last iteration as well?

Abhishek-Varma commented 4 months ago

It doesn't really 'require' that the last iteration is peeled, but it can handle it, so it would be simplest to always peel for now.

The reason I peeled the last iteration was to get around the issue of the pack/unpack back-and-forth between L1 and L2, but I think there's a way around that now without peeling the last iteration as well?

I think if the objectfifo backend handles the last iteration peeled, we should, for now, stick with that. The reason I'm suggesting this is because the peeling of the last iteration also helps in fusing the consumer (tensor.unpack + Elementwise).

Else if we choose to go ahead without peeling :-

  1. There'd be another blocker for some time here for enabling nested scf loop consumer fusion. OR
  2. We should then see how to get rid of the tensor.unpack (CC: @MaheshRavishankar )
jtuyls commented 4 months ago

I think if the objectfifo backend handles the last iteration peeled, we should, for now, stick with that.

Yeah, I agree.

Abhishek-Varma commented 4 months ago

Have raised a PR to add a fix to --iree-amdaie-unroll-and-distribute-workgroup pass : https://github.com/nod-ai/iree-amd-aie/pull/374

This helps fix a crash when trying to use the C++ pipeline -> the amdaie.tile op is hoisted out to the beginning of the amdaie.workgroup, but while trying to do so we're creating an invalid IR since the operand of the amdaie.tile is tied to the loop IV. This itself turned out to be a red-herring since the current implementation of the pass adds a strict check for normalised loop bounds, as a result of which it never unrolls the SCF loop and spuriously hoists the amdaie.tile op.

CC: @MaheshRavishankar @jtuyls @yzhang93

jtuyls commented 3 months ago

I think this PR can be closed now that the objectfifo pipeline has been merged with: https://github.com/nod-ai/iree-amd-aie/pull/473? Feel free to reopen if not the case.