Any difference between onnxruntime+tensorrt and tensorrt only in INT8 mode?

pycoco commented 2 years ago

I generate cache engine by onnxruntime+tensorrt EP, but the size of int8 model and fp16 model are same. But when i use trtexec to generate int8 engine, the model size seems correctly. I want to know any change when using onnxruntime+tensorrt EP to generate cache engine?

System information

OS Platform and Distribution : Linux Ubuntu 18.04
ONNX Runtime installed from (source or binary): build from source
ONNX Runtime version: 1.7.1
Python version: 3.7
Visual Studio version (if applicable):
GCC/Compiler version (if compiling from source):
CUDA/cuDNN version: 11.0
GPU model and memory: 2080TI 8g

pycoco commented 2 years ago

@yufenglee could you give me some suggestion?

jywu-msft commented 2 years ago

Can you provide some more details about how you are enabling int8/quantized model with onnxruntime+TensorRT EP? reference: https://github.com/microsoft/onnxruntime/issues/11873#issuecomment-1160677578 are you using calibration table or QDQ model?

pycoco commented 2 years ago

I use calibration table and this problem already been fixed. thanks

microsoft / onnxruntime

Any difference between onnxruntime+tensorrt and tensorrt only in INT8 mode? #12725