Open OvervCW opened 1 year ago
we fixed some issues with the engine cache hash generation. can you try with OnnxRuntime 1.14?
@jywu-msft It looks like that would solve our problem, yes. Unfortunately I can't test it right now since we're using ONNX via Triton and it hasn't been updated yet.
Describe the issue
I'm loading an ONNX model and enabling the TensorRT execution provider. I've set the ONNX runtime graph optimization level to -1 to minimize the number of optimizations performed outside TensorRT. The execution provider has been provided with a directory to cache engine files in.
I expected that it would have to build the engine once, when the model is loaded for the first time, and then never again. However, for some models it had to build the cache 2 or 3 times because it can produce slightly different engine names from the same model on every run.
For example, for one model it produced the following engine files:
It seems to eventually "converge" and not produce any new engine files, but I don't understand how it can decide on more than one name in the first place. Is there some optimization setting that might cause the model to slightly vary between runs and result in a different hash/name for TensorRT?
On a different note, I've also seen that the number "9156289995721013131" is identical across all of my ONNX models, even across wildly varying architectures. I thought this was a hash based on the model checksum, but apparently not?
To reproduce
I've used the following optimization settings:
precision_mode
:FP16
max_workspace_size_bytes
:1073741824
trt_engine_cache_enable
:true
trt_engine_cache_path
:/trt_cache/model_name
Urgency
No response
Platform
Linux
OS Version
Ubuntu 20.04.5 LTS (Focal Fossa)
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.13.1
ONNX Runtime API
Python
Architecture
X64
Execution Provider
TensorRT
Execution Provider Library Version
CUDA 11.8.0 / cuDNN 8.6.0.163 / TensorRT 8.5.0.12
Model File
No response
Is this a quantized model?
No