[Performance] Enable the SLPVectorizer to improve flash attention performance

intel / intel-xpu-backend-for-triton

OpenAI Triton backend for Intel® GPUs

MIT License

144 stars 44 forks source link

Open chengjunlu opened 1 week ago

chengjunlu commented 1 week ago

The SLPVectorizer + IGC_DisablePHIScalarizer improves the overall 3% performance on flash attention forward kernel.

Enable the SLPVectorizer before the IGCVectorizer could do the same transformation.