Does onnxruntime EP tensorrt quantized model can directly inference via tensorrt?

microsoft / onnxruntime-inference-examples

Examples for using ONNX Runtime for machine learning inferencing.

MIT License

1.2k stars 336 forks source link

Does onnxruntime EP tensorrt quantized model can directly inference via tensorrt? #101

Open lucasjinreal opened 2 years ago

lucasjinreal commented 2 years ago

EP tensorrt quantized int8 model, I want direcly inference via tensorrt, doesn't through onnxruntime, is that possible?

cloudhan commented 2 years ago

The code is opensourced, it is not hard to find the answer from it. So based on the following implementation detail https://github.com/microsoft/onnxruntime/blob/60db14bac39d5b0e7809d604d38e5b9aadf1df6e/onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc#L1558-L1568 if the engine encryption is not enabled, then the answer is simply yes.

lucasjinreal commented 2 years ago

@cloudhan thanks. May I ask further, what's the pros and cons in onnxruntime quantization with tensorrt EP compare with original tensorrt built-in int8 quantization?