Open ptillet opened 1 year ago
see https://github.com/openai/triton/pull/759#issuecomment-1275357306
May I ask you why it needs special support at mlir level? Just multiplying int8 matmul output by some factor would not generate optimized PTX code? (In case of quantized linear layer for instance)
see https://github.com/openai/triton/pull/759#issuecomment-1275357306