Open talmaj-at-hypothetic opened 1 year ago
I recommend using different engine cache path for different profile like this: https://github.com/microsoft/onnxruntime/blob/21a71d52bd2074b770807b209939ec11e2c64fa7/onnxruntime/python/tools/transformers/models/stable_diffusion/onnxruntime_tensorrt_txt2img.py#L94
The current TRT EP's logic of using engine cache is if the trt_engine_cache_enable
is on and there are matching engine/profile cache files in the cache path, TRT EP will load those files.
Just like tianleiwu suggested, could you use different trt_engine_cache_enable
(meaning maintaining different folders) for different inference session if the ep options are different?
I am keeping the same ep options. This part is basically narrowing down / debugging the problem:
If I first build the cache with dynamic shapes and in the next inference do not define the trt_profile shapes, then the cache is used.
So we would want the last behaviour with ep options with dynamic shapes always on.
Does that make it clearer?
- If I keep the same ep options with dynamic shapes it also always rebuilds the cache, even for the same dynamic input.
This behavior is weird, the engine cache shouldn't be rebuilt if the dynamic input is the same across inference runs. Could you turn on the verbose mode and share the log? Or could you share the model so that I can try to repro.
- If I first build cache with dynamic shapes and then switch off the dynamic shapes, then it always loads the cache for predefined shapes.
What do you mean by switching off the dynamic shapes? Does it mean not providing the trt_profile_xxx_shapes ep options? If so, the result you were seeing is expected. The engine cache built in the first inference run has the associated shape ranges for each dynamic input stored in the xxxxx.profile, and for the second inference run, TRT EP will compare the current input shape with the ones that in the xxxx.profile and found it was in the range, so TRT EP won't rebuild the engine and will use the cache directly. Only if the shape of current input is out of range, TRT EP will rebuild the engine.
Using onnxruntime with TensorrtExecutionProvider rebuilds the cache engine when you pass the trt_profile_min_shapes, trt_profile_opt_shapes, trt_profile_max_shapes. If I first build the cache with dynamic shapes and in the next inference do not define the trt_profile shapes, then the cache is used.
I am using:
onnxruntime-gpu==1.15.0
To reproduce
Run twice the example:
Urgency
Low.
Platform
Linux
OS Version
Ubuntu 20.04
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
onnxruntime-gpu==1.15.0
ONNX Runtime API
Python
Architecture
X64
Execution Provider
TensorRT
Execution Provider Library Version
docker image: nvcr.io/nvidia/tensorrt:22.12-py3