microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.79k stars 2.94k forks source link

Gemm layer is not quantized with QGemm node but with QLinearMatMul + QLinearAdd #10278

Open ghost opened 2 years ago

ghost commented 2 years ago

Hi,

I work with a simple onnx network exported from pytorch. The last fully connected layer (with bias) is exported as a Gemm node. After quantization (quantize_static) with the last onnxrt version (1.10) I was expecting the new QGemm layer but the quantized version is still split in two layers: QLinearMatMul + QLinearAdd Why that? How to get the more compact version QGemm instead? Thank you and BR, Ocaf

System information

ghost commented 2 years ago

Hi ! Is there any way to force the use of QGemm module? The extra node due to "Matmul + Add" is not optimized and the use of QGemm would make it easier for customized code porting on CPU target. BR, Ocaf

stale[bot] commented 2 years ago

This issue has been automatically marked as stale due to inactivity and will be closed in 7 days if no further activity occurs. If further support is needed, please provide an update and/or more details.