Efficient dequantization with new MLIR backend

triton-lang / triton

Development repository for the Triton language and compiler

https://triton-lang.org/

MIT License

12.7k stars 1.53k forks source link

Efficient dequantization with new MLIR backend #1049

Open ptillet opened 1 year ago

ptillet commented 1 year ago

see https://github.com/openai/triton/pull/759#issuecomment-1275357306

pommedeterresautee commented 1 year ago

May I ask you why it needs special support at mlir level? Just multiplying int8 matmul output by some factor would not generate optimized PTX code? (In case of quantized linear layer for instance)