openxla / xla

A machine learning compiler for GPUs, CPUs, and ML accelerators
Apache License 2.0
2.39k stars 358 forks source link

[XLA:GPU] Enable fusing of fp8 matmuls through Triton. #14232

Closed copybara-service[bot] closed 5 days ago

copybara-service[bot] commented 6 days ago

[XLA:GPU] Enable fusing of fp8 matmuls through Triton.

  1. Move cuBLAS fp8 GEMM rewriter after Triton GemmFusion, so that the Triton path has a chance to trigger.

  2. Don't normalize fp8 types to fp16 for dot instruction.