Can't run qdq model with TRT EP

mengniwang95 commented 2 years ago

Describe the issue

2022-10-20 09:21:09.531367276 [E:onnxruntime:Default, tensorrt_execution_provider.h:58 log] [2022-10-20 09:21:09 ERROR] 4: [network.cpp::validate::2891] Error Code 4: Internal Error (Int8 precision has been set for a layer or layer output, but int8 is not configured in the builder) Exception Traceback (most recent call last): File "/workspace/mengniwa/test/tools/transformers/benchmark_helper.py", line 135, in create_onnxruntime_session session = InferenceSession(onnx_model_path, sess_options, providers=providers) File "/usr/local/lib/python3.8/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 347, in init self._create_inference_session(providers, provider_options, disabled_optimizers) File "/usr/local/lib/python3.8/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 395, in _create_inference_session sess.initialize_session(providers, provider_options, disabled_optimizers) onnxruntime.capi.onnxruntime_pybind11_state.EPFail: [ONNXRuntimeError] : 11 : EP_FAIL : TensorRT EP could not build engine for fused node: TensorrtExecutionProvider_TRTKernel_graph_mxnet_converted_model_6241742266258321209_75_0

To reproduce

https://github.com/microsoft/onnxruntime-inference-examples/tree/main/quantization/image_classification/cpu Generate a qdq rn50 model with above link(generated is too large to upload), I add in quantize_static API, use QDQ quant format, per-tensor, int8 weight.

Then I build TRT EP enabled ORT with latest main branch, then I use https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/transformers/profiler.py to do test, disable all optimization

Urgency

urgent

Platform

Other / Unknown

OS Version

https://docs.nvidia.com/deeplearning/tensorrt/container-release-notes/rel_22-07.html#rel_22-07

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

67150baa8d50e74f3a8a7b8e679a8d31eae4c0ed

ONNX Runtime API

Python

Architecture

X86

Execution Provider

TensorRT

Execution Provider Library Version

No response

stevenlix commented 2 years ago

TRT quantization on GPU is a little different from CPU. Please refer to these examples, https://github.com/microsoft/onnxruntime-inference-examples/blob/main/quantization/object_detection/trt/yolov3/e2e_user_yolov3_example.py https://github.com/microsoft/onnxruntime-inference-examples/blob/main/quantization/image_classification/trt/resnet50/e2e_tensorrt_resnet_example.py For CNN model, calibration approach should be good enough and no QDQ model is needed as shown in the examples.

mengniwang95 commented 2 years ago

Hi, I also try the link you shared and it can run successfully. I want to learn how to generate a QDQ model which can run with TRT EP. Dose it have some restrictions?

microsoft / onnxruntime