intel / intel-xpu-backend-for-triton

OpenAI Triton backend for Intel® GPUs
MIT License
121 stars 33 forks source link

[Productize Flash Attention performance #5] replace Q with slm store/load in attention #1463

Closed Dewei-Wang-sh closed 1 month ago

quintinwang5 commented 1 month ago

This feature has a dependency to https://github.com/intel/intel-xpu-backend-for-triton/issues/1461. Get following failures now:

[convert-triton-to-tritongpu-warp]: ***********************************************

[convert-triton-to-tritongpu-warp]: this has tt.dot, but workload do not match any

[convert-triton-to-tritongpu-warp]: ***********************************************

Coredump:
LLVM ERROR: TritonGPU module should contain a triton_gpu.num-warps attribute
Dewei-Wang-sh commented 1 month ago

pr merged