Closed Abhishek-Varma closed 1 week ago
I don't think we should pull this pass apart as the IR in between each of those stages isn't stable, for example doing a cse in between would break it. These new passes are really only useful if executed back-to-back in the exact order
iree-amdaie-distribute-core-and-objectfifo
had, which I think demonstrates that they are coupled too tightly to be pulled apart.
You're correct - but I don't think we would want to invoke a --cse
between these individual passes though while we simultaneously work out a better way to rewrite the pass as well as carry out fixes in the individual passes.
Indeed the six passes, for now, will have to be invoked back-to-back (as demonstrated in the matmul_peeled_objectfifo.mlir
as well as I tried the e2e via the WIP C++ pipeline locally).
The main motive is to help making debugging relatively easier - especially w.r.t iree-amdaie-distribute-local-memory
, iree-amdaie-insert-logical-objectfifo-access
and iree-amdaie-assign-tile-locations
- because currently any crash at the bigger pass basically means manually looking at intermediate IRs (the --debug
is quite noisy therefore the main goal was to help with the mlir-print-ir-before/after-all
as we shouldn't have to look at unrolling of the loops, etc that's taking place within distribute-core-and-objectfifo
).
The iree-amdaie-distribute-core-and-objectfifo is indeed overly complicated but I think this is mainly caused by the global memref usage across (possibly) peeled stages of the computation and the need to keep this global memref information around while also distributing some of them on both local and shared memory (different cores and memtiles). Once we figure out how to best do that, I think the pass can be split into (roughly):
- Unroll local for loops
- Distribute memrefs
- Assign AIE tile locations
Since we plan on segregating distribute-core-and-objectfifo
eventually, I believe we can perhaps use this PR to stage it well as it isolates the phases involved (albeit tightly coupled) and then work our way back to make it concise to be just the three set of passes you mentioned ?
Let me know your thoughts.
The main motive is to help making debugging relatively easier ... (the --debug is quite noisy therefore the main goal was to help with the mlir-print-ir-before/after-all as we shouldn't have to look at unrolling of the loops, etc that's taking place within distribute-core-and-objectfifo
For non-trivial changes/debugging you will have to look at multiple of these stages and how they influence each other as they are tightly coupled so I don't think separating the passes will help much and imo it's not a reason enough to pull the pass apart. For example seeing an issue in iree-amdaie-distribute-shared-memory
likely means having to change/fix the transformations above.
Since we plan on segregating distribute-core-and-objectfifo eventually, I believe we can perhaps use this PR to stage it well as it isolates the phases involved (albeit tightly coupled) and then work our way back to make it concise to be just the three set of passes you mentioned ?
I don't see how pulling the pass apart will help with the rewrite as I think the rewrite should go in an entirely different direction and I hope most logic of these passes will just have to be deleted. I think natural segregation happens when the intermediate IR is stable and we should not do it before we get there.
@jtuyls and I had a discussion offline.
So since the current pass is stable, pulling it out currently would make it unstable w.r.t the intermediate IRs as the IR at each stage is dependent on the previous stage.
The pass will definitely need a rewrite though and only then it should be pulled out into the three passes as Jorn mentioned in the above thread.
I'm closing this PR.
@jtuyls and I had a discussion offline.
So since the current pass is stable, pulling it out currently would make it unstable w.r.t the intermediate IRs as the IR at each stage is dependent on the previous stage.
The pass will definitely need a rewrite though and only then it should be pulled out into the three passes as Jorn mentioned in the above thread.
I'm closing this PR.
Yeah, it makes sense to me. Sorry, I clicked the wrong button and reopened the PR.
Can you start to send a PR for the fix we had for vectorization? I think we can first get in the changes for InsertCores
and DistributeCores
passes.
This PR is an attempt to deprecate
--iree-amdaie-distribute-core-and-objectfifo
pass.Reason: A lot of IR modification takes place within the same pass in an atomic way. All the individual phases deserve their own set of passes. Therefore we'd ideally like to have a clean distinction between the individual phases that reflects in
mlir-print-ir-before/after-all
that helps us to debug and by that extension develop/maintain the passes relatively better.With this pass we will have the following new passes now (each in their separate commit along with incremental changes made to eventually deprecate
iree-amdaie-distribute-core-and-objectfifo
pass) :-iree-amdaie-local-forall-to-for
iree-amdaie-distribute-local-memory
iree-amdaie-unroll-local-loops
iree-amdaie-insert-logical-objectfifo-access
iree-amdaie-assign-tile-locations
iree-amdaie-distribute-shared-memory
Besides the above, changes were made to
matmul_peeled_objectfifo.mlir
lit test to incorporate above set of passes (in-order) instead.Also, changes were made to
CMakeLists.txt
ofTransforms
andTransforms/test
- this just helps any of us to not have to nit-pick on adding new passes/lit test files every time - just create a new pass and the lit test - it should work. (I currently have added this as 2 separate commits but after consensus will squash into one).NOTE for reviewers:
Passes.td
in alphabetical order..vmfb
after addressing comments on this PR and get in the other fixes for pure Matmul + Vectorization + Objectfifo incrementally (each fix in their own separate PR).