intel / intel-xpu-backend-for-triton

OpenAI Triton backend for Intel® GPUs
MIT License
144 stars 44 forks source link

Enable `-tritonintelgpu-optimize-reduction-locality` by default #2748

Open victor-eds opened 1 week ago

victor-eds commented 1 week ago

After pending issues that will be listed below are merged, this pass can be enabled by default:

Speedups after these changes in victor/perf-test (attn benchmark): Min Speedup: 0.997086 Quartile 1: 1.000324 Median: 1.026954 Quartile 2: 1.121962 Max speedup: 1.181823 Average: 1.064171 Average if improved (>=1.05): 1.131925

Improved (>=1.05): 12/26

Worse (>=1.05): 0