intel / intel-xpu-backend-for-triton

OpenAI Triton backend for Intel® GPUs
MIT License
144 stars 44 forks source link

Improve GEMM performance of shape 4096x8x128x16384 #2646

Closed ESI-SYD closed 2 weeks ago

ESI-SYD commented 2 weeks ago

This change (grid order adjustment to improve cache hit) originating from https://github.com/intel/intel-xpu-backend-for-triton/pull/2600. Batched gemm only. ~99% of XeTLA for 4096x8x128x16384. image