Open pycoco opened 2 years ago
TRT EP only supports quantization format with symmetric activation and weight. What quantization command did you use? Could you please refer to example here: https://github.com/microsoft/onnxruntime-inference-examples/tree/main/quantization/image_classification/trt/resnet50
I use pytorch-quantization tools of tensorrt to export onnx model. so it's quantization format with symmetric activation and weight. And this example is PTQ not QDQ, right?
It is in QDQ too. @chilo-ms @stevenlix to help on the error.
@pycoco How do you run the QDQ model with TRT EP? Which ORT and TRT version did you use? Here is an end-to-end BERT QDQ example: https://github.com/microsoft/onnxruntime-inference-examples/blob/main/quantization/nlp/bert/trt/e2e_tensorrt_bert_example.py
Make sure the right QDQ node is being inserted (From our example code at line 265, the calibration generates dynamic range for tensors and then starts to insert QDQ node based on compute range or dynamic range). Also make sure int8 flag is set.
pytorch-quantization tools
thanks for your reply, i insert QDQ node by pytorch-quantization tools. i will try this step with onnxruntime pipeline.
@yufenglee @chilo-ms Does onnxruntime not support part of model quantization? When i set op_types_to_quantize =['Conv'] in create_calibrator(), this error also happened. But when i pass [] to create_calibrator(), everything is ok.
The error is: ERROR] 4: [standardEngineBuilder.cpp::initCalibrationParams::1402] Error Code 4: Internal Error (Calibration failure occurred with no scaling factors detected. This could be due to no int8 calibrator or insufficient custom scales for network layers. Please see int8 sample to setup calibration correctly.) Also, there are some tools or parameters for debug in python?