[FA2 performance] flashattention with dim=128 get ~90% of xetla

intel / intel-xpu-backend-for-triton

OpenAI Triton backend for Intel® GPUs

MIT License

137 stars 41 forks source link

Open Dewei-Wang-sh opened 2 weeks ago

Dewei-Wang-sh commented 2 weeks ago

currently on the main branch, we get ~70%.

Dewei-Wang-sh commented 2 weeks ago

WIP, analyzing the assembly.