Open chengjunlu opened 6 days ago
Comments from @whitneywhtsang .
Note: New agama should be available this week, and it contains an improvement on instruction scheduling. IMO, we should rely on IGC instruction scheduling instead of implementing one in Triton.
Let's wait new agama.
The register spill is mainly caused by the long liveness value defined by the
tt.load
operation in Triton kernel.After re-schedule the
tt.load
operation close to its user can eliminate the register spill in large GRF mode.IGC is working on enhancing the instruction scheduling when codegen.
Before it is available on IGC, we need a quick work around for the flash attention case on Triton side.