Onnxruntime TensorRT create one cache when models same structure

KyloEntro commented 3 years ago

Describe the bug Enable Onnxruntime TensorRT engine cache and do inference on 2 inference models. The 2 models are mobilenetv3, only dataset used to learn is different. Onnxruntime TensorRT generates only one engine cache file. The inference result are same instead to be different.

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux and Windows
ONNX Runtime installed from (source or binary): source
ONNX Runtime version: 1.5
Python version: 3.6
CUDA/cuDNN version: 11, 8.0
GPU model and memory: RTX6000

To Reproduce Takes 2 models (here mobilenet v3), suppose mobilenetv3_1 and mobilenetv3_2. On same image, mobilenetv3_1 returns result_1 and mobilenetv3_2 returns result_2 which is differents than result_1. Enable ORT_TENSORRT_ENGINE_CACHE_ENABLE then run inference with this 2 models. Only one engine cache is generated. The 2 models give same result.

Expected behavior 2 engines files are generated and inference results are differents, Same as if no engine cache is enabled.

stevenlix commented 3 years ago

You can specify different engine cache path for the models by using ORT_TENSORRT_CACHE_PATH. Details about env variables can be found here, https://github.com/microsoft/onnxruntime/blob/master/docs/execution_providers/TensorRT-ExecutionProvider.md#configuring-environment-variables

stevenlix commented 3 years ago

Also if possible please upgrade to ORT1.6.

KyloEntro commented 3 years ago

Hi ! Thanks for your help. I don't know how ORT_TENSORRT_CACHE_PATH can solve my issue, my program use 2 models at same time.

Ok I will try to use ORT 1.6, seems that, it will generate 2 engines files if shapes are differents. In my case, the output shape is different

stale[bot] commented 2 years ago

This issue has been automatically marked as stale due to inactivity and will be closed in 7 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

microsoft / onnxruntime

Onnxruntime TensorRT create one cache when models same structure #6455