Open newling opened 1 month ago
Could you add some motivation/comments/logic so it is easy to follow?
Could you add some motivation/comments/logic so it is easy to follow?
I've added a description to the PR, but I'm not sure if that's what you're requesting?
Motivation: memref.expand_shape and memref.collapse_shape enter when I add linalg::populateFoldUnitExtentDimsPatterns(ps, options);
which is needed to get vectorization working for convolution.
Would you like more comments in the code? I am happy to add these if so.
Would you like more comments in the code? I am happy to add these if so.
Yes, I think more comments and explanations are needed, especially for the utility function (e.g. getLinearCombination
) and how to determine the offsets/strides for expand/collapse op.
@yzhang93 thanks for your review. Please don't review this again until I remove the [WIP]. This is the "large PR" which I said could be eliminated if I could get "linalg-fold-unit-extent-dims" to not create collapse_shape and expand_shape ops. I'm experimenting with 'useRankReducingSlices = true' in that pass now as suggested by Mahesh to see what happens, if that works I might abandon this PR
This PR extends
iree-amdaie-convert-to-dma
to handle more situations.The goal of the pass
iree-amdaie-convert-to-dma
is to convertiree_linalg_ext.pack
,iree_linalg_ext.unpack
andlinalg.copy
operations intoamdaie.dma_cpy_nd
operations. Thelinalg.copy
ops are of no concern in this PR, as they are converted to pack and unpack ops very early in this pass. The logic foriree_linalg_ext.unpack
is essentially the same as foriree_linalg_ext.pack
, so I will only discussiree_linalg_ext.pack
in the next paragraphs.There are 2 main differences between
iree_linalg_ext.pack
andamdaie.dma_cpy_nd
which the pass needs to handle.The first is that operands of
amdaie.dma_cpy_nd
areamdaie.logicalobjectfifo
, which are essentially justmemref.alloc
ops (tied to a set of tiles). The operationiree_linalg_ext.pack
on the other hand has operands which are not as 'directly' connected tomemref.alloc
ops, as they can can bememref.subview
s of allocations, or indeed any arbitrary chain ofmemref.subview
,memref.expand_shape
,memref.collapse_shape
, etc. The pass therefore needs to find thememref.alloc
at the start of the chain which ultimately defines the operand of theiree_linalg_ext.pack
, and build theamdaie.dma_cpy_nd
based on thatmemref.alloc
. Before this PR, it was assumed that the chain connecting amemref.alloc
toiree_linalg_ext.pack
was at most a singlememref.subview
op. This PR extends this to a chain of any length. It avoids recursion. Please see lit test for examples.The second is that
amdaie.dma_cpy_nd
hasoffsets
,sizes
andstrides
. These need to be derived from theiree_linalg_ext.pack
and all of the operations in the chain from thememref.alloc
to theiree_linalg_ext.pack
. With this PR, each of the operationsmemref.subview
,memref.collapse_shape
andmemref.expand_shape
in a chain frommemref.alloc
toiree_linalg_ext.pack
has specific logic for modifyingoffsets
,sizes
andstrides
. Vectorsoffsets
,sizes
andstrides
are initialized at thememref.alloc
, and then each op in the chain to theiree_linalg_ext.pack
mutates them. And the they are mutated one last time based on theiree_linalg_ext.pack
ops permutation and tile sizes.Please see the lit tests for examples of these modifications.