Open chengjunlu opened 1 month ago
Comments from @whitneywhtsang .
Note: New agama should be available this week, and it contains an improvement on instruction scheduling. IMO, we should rely on IGC instruction scheduling instead of implementing one in Triton.
Let's wait new agama.
After closely worked with IGC team, we can reduce the register spill size to 0 on the default pass.
I need to double confirm the new internal IGC driver works properly to make sure all the changes are available in the latest IGC driver.
And then wait the new agama release.
The register spill is mainly caused by the long liveness value defined by the
tt.load
operation in Triton kernel.After re-schedule the
tt.load
operation close to its user can eliminate the register spill in large GRF mode.IGC is working on enhancing the instruction scheduling when codegen.
Before it is available on IGC, we need a quick work around for the flash attention case on Triton side.