Gemm layer is not quantized with QGemm node but with QLinearMatMul + QLinearAdd

ghost commented 2 years ago

Hi,

I work with a simple onnx network exported from pytorch. The last fully connected layer (with bias) is exported as a Gemm node. After quantization (quantize_static) with the last onnxrt version (1.10) I was expecting the new QGemm layer but the quantized version is still split in two layers: QLinearMatMul + QLinearAdd Why that? How to get the more compact version QGemm instead? Thank you and BR, Ocaf

System information

Linux Ubuntu 18.04:
ONNX Runtime : 1.10.0
ONNX version: 1.9.0
Python version: 3. 8.10

ghost commented 2 years ago

Hi ! Is there any way to force the use of QGemm module? The extra node due to "Matmul + Add" is not optimized and the use of QGemm would make it easier for customized code porting on CPU target. BR, Ocaf

stale[bot] commented 2 years ago

This issue has been automatically marked as stale due to inactivity and will be closed in 7 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

microsoft / onnxruntime

Gemm layer is not quantized with QGemm node but with QLinearMatMul + QLinearAdd #10278