iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.
http://iree.dev/
Apache License 2.0
2.58k stars 579 forks source link

[GPU] Gather -> matmul fusion support #18457

Open IanWood1 opened 3 weeks ago

IanWood1 commented 3 weeks ago

Note: Similar to https://github.com/iree-org/iree/issues/18447 but for matmul. We want to support fusing gather-like linalg.generic ops with matmul ops.

Problem

Due to the small tensor sizes (tensor<8x7x5xf32>), this example does not throw any errors due to excessive shared memory allocation. But inspecting the dump and/or using larger tensor sizes shows that each batch of the 'gathered' tensor is materialized aka 7x5xf32 (and codegen fails when using a larger vector size).

Another problem is that the LLVMGPUVectorize pipeline is being used. Apparently, either LLVMGPUVectorDistribute or igemm should be used instead.

IR/Logs

https://gist.github.com/IanWood1/2f6b5c6af9597d47efbd2506f0cc19b9 contains the executable sources & the original linalg IR.

Here is a dump of IR after each pass https://gist.githubusercontent.com/IanWood1/1c2bdb053a4929dca98c019768ffae41/raw/7ab58055d4be208e6cede980a13121dbbf49eac9/pre-gather-matmul.mlir.

cc @MaheshRavishankar

qedawkins commented 3 weeks ago

I'm going to try to turn on the igemm pipeline for this. One of the main required pieces for turning the pipeline on by default is this: https://github.com/iree-org/iree/pull/18394

After which we will need some configuration updates.