Open whitneywhtsang opened 1 week ago
GEMM out of box performance has degraded from https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/11769279538 to https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/11787592466. For example, for 4k GEMM, it degraded by 5%. From looking at the commits between the two runs, likely it is caused by ca95a70b226a5b92c4e84a9987d920de4cc23a69, which is intended to improve GEMM of shape 4096x8x128x16384.
For 4k case: revert ca95a70 does not work : run
ca95a70 should not impact non batched gemm cases because change only happened in matmul_kernel_with_block_pointers_batched.
matmul_kernel_with_block_pointers_batched
I think this maybe comes from variance, found this case come back in this run (include ca95a70, PR triggered):
Now this issue mainly depends on fix to https://github.com/intel/intel-xpu-backend-for-triton/issues/2733 and https://github.com/intel/intel-xpu-backend-for-triton/issues/2734.
GEMM out of box performance has degraded from https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/11769279538 to https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/11787592466. For example, for 4k GEMM, it degraded by 5%. From looking at the commits between the two runs, likely it is caused by ca95a70b226a5b92c4e84a9987d920de4cc23a69, which is intended to improve GEMM of shape 4096x8x128x16384.