nod-ai / iree-amd-aie

IREE plugin repository for the AMD AIE accelerator
Apache License 2.0
69 stars 30 forks source link

[WIP][AMDAIEConvertToDma] Support memref shape collapse/expand #800

Open newling opened 1 month ago

newling commented 1 month ago

This PR extends iree-amdaie-convert-to-dma to handle more situations.

The goal of the pass iree-amdaie-convert-to-dma is to convert iree_linalg_ext.pack, iree_linalg_ext.unpack and linalg.copy operations into amdaie.dma_cpy_nd operations. The linalg.copy ops are of no concern in this PR, as they are converted to pack and unpack ops very early in this pass. The logic for iree_linalg_ext.unpack is essentially the same as for iree_linalg_ext.pack, so I will only discuss iree_linalg_ext.pack in the next paragraphs.

There are 2 main differences between iree_linalg_ext.pack and amdaie.dma_cpy_nd which the pass needs to handle.

The first is that operands of amdaie.dma_cpy_nd are amdaie.logicalobjectfifo, which are essentially just memref.alloc ops (tied to a set of tiles). The operation iree_linalg_ext.pack on the other hand has operands which are not as 'directly' connected to memref.alloc ops, as they can can be memref.subviews of allocations, or indeed any arbitrary chain of memref.subview, memref.expand_shape, memref.collapse_shape, etc. The pass therefore needs to find the memref.alloc at the start of the chain which ultimately defines the operand of the iree_linalg_ext.pack, and build the amdaie.dma_cpy_nd based on that memref.alloc. Before this PR, it was assumed that the chain connecting a memref.alloc to iree_linalg_ext.pack was at most a single memref.subview op. This PR extends this to a chain of any length. It avoids recursion. Please see lit test for examples.

The second is that amdaie.dma_cpy_nd has offsets, sizes and strides. These need to be derived from the iree_linalg_ext.pack and all of the operations in the chain from the memref.alloc to the iree_linalg_ext.pack. With this PR, each of the operations memref.subview, memref.collapse_shape and memref.expand_shape in a chain from memref.alloc to iree_linalg_ext.pack has specific logic for modifying offsets, sizes and strides. Vectors offsets, sizes and strides are initialized at the memref.alloc, and then each op in the chain to the iree_linalg_ext.pack mutates them. And the they are mutated one last time based on the iree_linalg_ext.pack ops permutation and tile sizes.

Please see the lit tests for examples of these modifications.

yzhang93 commented 1 month ago

Could you add some motivation/comments/logic so it is easy to follow?

newling commented 1 month ago

Could you add some motivation/comments/logic so it is easy to follow?

I've added a description to the PR, but I'm not sure if that's what you're requesting?

Motivation: memref.expand_shape and memref.collapse_shape enter when I add linalg::populateFoldUnitExtentDimsPatterns(ps, options); which is needed to get vectorization working for convolution.

Would you like more comments in the code? I am happy to add these if so.

yzhang93 commented 1 month ago

Would you like more comments in the code? I am happy to add these if so.

Yes, I think more comments and explanations are needed, especially for the utility function (e.g. getLinearCombination) and how to determine the offsets/strides for expand/collapse op.

newling commented 1 month ago

@yzhang93 thanks for your review. Please don't review this again until I remove the [WIP]. This is the "large PR" which I said could be eliminated if I could get "linalg-fold-unit-extent-dims" to not create collapse_shape and expand_shape ops. I'm experimenting with 'useRankReducingSlices = true' in that pass now as suggested by Mahesh to see what happens, if that works I might abandon this PR