After pending issues that will be listed below are merged, this pass can be enabled by default:

https://github.com/intel/intel-xpu-backend-for-triton/issues/2749
https://github.com/intel/intel-xpu-backend-for-triton/issues/2750
https://github.com/intel/intel-xpu-backend-for-triton/issues/2751
https://github.com/intel/intel-xpu-backend-for-triton/issues/2752

Speedups after these changes in victor/perf-test (attn benchmark): Min Speedup: 0.997086 Quartile 1: 1.000324 Median: 1.026954 Quartile 2: 1.121962 Max speedup: 1.181823 Average: 1.064171 Average if improved (>=1.05): 1.131925

intel / intel-xpu-backend-for-triton

Enable `-tritonintelgpu-optimize-reduction-locality` by default #2748

Improved (>=1.05): 12/26

Worse (>=1.05): 0