Open IanWood1 opened 3 weeks ago
I'm going to try to turn on the igemm pipeline for this. One of the main required pieces for turning the pipeline on by default is this: https://github.com/iree-org/iree/pull/18394
After which we will need some configuration updates.
Note: Similar to https://github.com/iree-org/iree/issues/18447 but for matmul. We want to support fusing gather-like
linalg.generic
ops with matmul ops.Problem
Due to the small tensor sizes (
tensor<8x7x5xf32>
), this example does not throw any errors due to excessive shared memory allocation. But inspecting the dump and/or using larger tensor sizes shows that each batch of the 'gathered' tensor is materialized aka 7x5xf32 (and codegen fails when using a larger vector size).Another problem is that the
LLVMGPUVectorize
pipeline is being used. Apparently, eitherLLVMGPUVectorDistribute
or igemm should be used instead.IR/Logs
https://gist.github.com/IanWood1/2f6b5c6af9597d47efbd2506f0cc19b9 contains the executable sources & the original linalg IR.
Here is a dump of IR after each pass https://gist.githubusercontent.com/IanWood1/1c2bdb053a4929dca98c019768ffae41/raw/7ab58055d4be208e6cede980a13121dbbf49eac9/pre-gather-matmul.mlir.
cc @MaheshRavishankar