[Performance] TensorRT provider produces (slightly) differently named engine files for the same model between runs

OvervCW commented 1 year ago

Describe the issue

I'm loading an ONNX model and enabling the TensorRT execution provider. I've set the ONNX runtime graph optimization level to -1 to minimize the number of optimizations performed outside TensorRT. The execution provider has been provided with a directory to cache engine files in.

I expected that it would have to build the engine once, when the model is loaded for the first time, and then never again. However, for some models it had to build the cache 2 or 3 times because it can produce slightly different engine names from the same model on every run.

For example, for one model it produced the following engine files:

TensorrtExecutionProvider_TRTKernel_graph_tf2onnx_9156289995721013131_19_0_fp16.engine
TensorrtExecutionProvider_TRTKernel_graph_tf2onnx_9156289995721013131_31_0_fp16.engine
TensorrtExecutionProvider_TRTKernel_graph_tf2onnx_9156289995721013131_15_0_fp16.engine

It seems to eventually "converge" and not produce any new engine files, but I don't understand how it can decide on more than one name in the first place. Is there some optimization setting that might cause the model to slightly vary between runs and result in a different hash/name for TensorRT?

On a different note, I've also seen that the number "9156289995721013131" is identical across all of my ONNX models, even across wildly varying architectures. I thought this was a hash based on the model checksum, but apparently not?

To reproduce

I've used the following optimization settings:

precision_mode: FP16
max_workspace_size_bytes: 1073741824
trt_engine_cache_enable: true
trt_engine_cache_path: /trt_cache/model_name

Urgency

No response

Platform

Linux

OS Version

Ubuntu 20.04.5 LTS (Focal Fossa)

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.13.1

ONNX Runtime API

Python

Architecture

X64

Execution Provider

TensorRT

Execution Provider Library Version

CUDA 11.8.0 / cuDNN 8.6.0.163 / TensorRT 8.5.0.12

Model File

No response

Is this a quantized model?

No

jywu-msft commented 1 year ago

we fixed some issues with the engine cache hash generation. can you try with OnnxRuntime 1.14?

OvervCW commented 1 year ago

@jywu-msft It looks like that would solve our problem, yes. Unfortunately I can't test it right now since we're using ONNX via Triton and it hasn't been updated yet.

microsoft / onnxruntime