[GEMM] Performance degradation out of box

intel / intel-xpu-backend-for-triton

OpenAI Triton backend for Intel® GPUs

MIT License

144 stars 44 forks source link

Open whitneywhtsang opened 1 week ago

whitneywhtsang commented 1 week ago

Screenshot 2024-11-12 184443

GEMM out of box performance has degraded from https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/11769279538 to https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/11787592466. For example, for 4k GEMM, it degraded by 5%. From looking at the commits between the two runs, likely it is caused by ca95a70b226a5b92c4e84a9987d920de4cc23a69, which is intended to improve GEMM of shape 4096x8x128x16384.

ESI-SYD commented 1 week ago

For 4k case: revert ca95a70 does not work : run

ca95a70 should not impact non batched gemm cases because change only happened in matmul_kernel_with_block_pointers_batched.

I think this maybe comes from variance, found this case come back in this run (include ca95a70, PR triggered):

ESI-SYD commented 1 week ago