intel / intel-xpu-backend-for-triton

OpenAI Triton backend for Intel® GPUs
MIT License
137 stars 41 forks source link

[FA2 performance] flashattention with dim=128 get ~90% of xetla #2339

Open Dewei-Wang-sh opened 2 weeks ago

Dewei-Wang-sh commented 2 weeks ago

currently on the main branch, we get ~70%.

Dewei-Wang-sh commented 2 weeks ago

WIP, analyzing the assembly.