microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.63k stars 2.92k forks source link

inference qdq model failed with TRT EP. #10743

Open pycoco opened 2 years ago

pycoco commented 2 years ago

The error is: ERROR] 4: [standardEngineBuilder.cpp::initCalibrationParams::1402] Error Code 4: Internal Error (Calibration failure occurred with no scaling factors detected. This could be due to no int8 calibrator or insufficient custom scales for network layers. Please see int8 sample to setup calibration correctly.) Also, there are some tools or parameters for debug in python?

yufenglee commented 2 years ago

TRT EP only supports quantization format with symmetric activation and weight. What quantization command did you use? Could you please refer to example here: https://github.com/microsoft/onnxruntime-inference-examples/tree/main/quantization/image_classification/trt/resnet50

pycoco commented 2 years ago

https://github.com/microsoft/onnxruntime-inference-examples/tree/main/quantization/image_classification/trt/resnet50

I use pytorch-quantization tools of tensorrt to export onnx model. so it's quantization format with symmetric activation and weight. And this example is PTQ not QDQ, right?

yufenglee commented 2 years ago

It is in QDQ too. @chilo-ms @stevenlix to help on the error.

chilo-ms commented 2 years ago

@pycoco How do you run the QDQ model with TRT EP? Which ORT and TRT version did you use? Here is an end-to-end BERT QDQ example: https://github.com/microsoft/onnxruntime-inference-examples/blob/main/quantization/nlp/bert/trt/e2e_tensorrt_bert_example.py

Make sure the right QDQ node is being inserted (From our example code at line 265, the calibration generates dynamic range for tensors and then starts to insert QDQ node based on compute range or dynamic range). Also make sure int8 flag is set.

pycoco commented 2 years ago

pytorch-quantization tools

thanks for your reply, i insert QDQ node by pytorch-quantization tools. i will try this step with onnxruntime pipeline.

pycoco commented 2 years ago

@yufenglee @chilo-ms Does onnxruntime not support part of model quantization? When i set op_types_to_quantize =['Conv'] in create_calibrator(), this error also happened. But when i pass [] to create_calibrator(), everything is ok.