Open ghost opened 2 years ago
Hi ! Is there any way to force the use of QGemm module? The extra node due to "Matmul + Add" is not optimized and the use of QGemm would make it easier for customized code porting on CPU target. BR, Ocaf
This issue has been automatically marked as stale due to inactivity and will be closed in 7 days if no further activity occurs. If further support is needed, please provide an update and/or more details.
Hi,
I work with a simple onnx network exported from pytorch. The last fully connected layer (with bias) is exported as a Gemm node. After quantization (quantize_static) with the last onnxrt version (1.10) I was expecting the new QGemm layer but the quantized version is still split in two layers: QLinearMatMul + QLinearAdd Why that? How to get the more compact version QGemm instead? Thank you and BR, Ocaf
System information