[XLA:GPU] Enable fusing of fp8 matmuls through Triton.

openxla / xla

A machine learning compiler for GPUs, CPUs, and ML accelerators

Apache License 2.0

2.39k stars 358 forks source link

Closed copybara-service[bot] closed 5 days ago

copybara-service[bot] commented 6 days ago

[XLA:GPU] Enable fusing of fp8 matmuls through Triton.

Move cuBLAS fp8 GEMM rewriter after Triton GemmFusion, so that the Triton path has a chance to trigger.
Don't normalize fp8 types to fp16 for dot instruction.