Closed ofirzaf closed 3 years ago
optimizer.optimize_model fused subgraph to customized operators. ONNX shape inference doesn't work for the customized operators, so calibration tool can not generate quantization parameter for it. PR #8788 can solve the error.
If you want to try static quantization on transformer models, it'd better to not use Optimizer.optimize_model to fuse the model or only few ops will be quantized because we don't have static quantization support for the fused ops.
Fixed with #8788 .
Describe the bug I tried to perform static quantization (not dynamic) for a transformer model based on your guide for quantizing BERT and I got the following error.
System information
To Reproduce I used the following code.
Note that the onnx model and optimized model generated using the export and optimizer functions are producing the expected result when running them using onnxruntime.
Running this code produced the following traceback:
Expected behavior The function
quantize_static
should save a quantized onnx model in the stated path.