microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.84k stars 2.94k forks source link

Any difference between onnxruntime+tensorrt and tensorrt only in INT8 mode? #12725

Closed pycoco closed 2 years ago

pycoco commented 2 years ago

I generate cache engine by onnxruntime+tensorrt EP, but the size of int8 model and fp16 model are same. But when i use trtexec to generate int8 engine, the model size seems correctly. I want to know any change when using onnxruntime+tensorrt EP to generate cache engine?

System information

pycoco commented 2 years ago

@yufenglee could you give me some suggestion?

jywu-msft commented 2 years ago

Can you provide some more details about how you are enabling int8/quantized model with onnxruntime+TensorRT EP? reference: https://github.com/microsoft/onnxruntime/issues/11873#issuecomment-1160677578 are you using calibration table or QDQ model?

pycoco commented 2 years ago

I use calibration table and this problem already been fixed. thanks