microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.05k stars 2.83k forks source link

Layer not supported in one provider (Tensorrt) not working with second provider (CUDA) in an inference problem. #20058

Open rribes opened 5 months ago

rribes commented 5 months ago

Describe the issue

Hello, I am working with a Jetson Orin Nano from NVIDIA and I am trying to execute an inference with onnxruntime with a onnx model that was converted from pytorch to onnx.

The inference is done with Tensorrt execution provider and CUDA execution provider. With CUDA execution provider the inference is done properly but with this two providers it freezes my screen.

From what i know, in my model there is a layer that is not supported by Tensorrt, but for that reason, I do the inference with two execution providers, because as far as I am concerned,whatever the first provider cannot do (Tensorrt), the second provider will do (CUDA). The fact is that, when I do the inference with this two providers the model does not run properly, but with just CUDA provider the model runs correctly. With just Tensorrt provider does not work.

I understand that when I do the inference with two execution provider, the layer that is not supported by Tensorrt, will be done by the second provider CUDA. However, this is not done when I execute my script.

Thank you in advance for you help.

To reproduce

To do the inference I execute the following line of code : sess = ort.InferenceSession('model.onnx', providers = ['TensorrtExecutionProvider','CUDAExecutionProvider'])

Urgency

No response

Platform

Linux

OS Version

5.10

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.16.1

ONNX Runtime API

Python

Architecture

ARM64

Execution Provider

CUDA, TensorRT

Execution Provider Library Version

No response

chilo-ms commented 5 months ago

Could you share the model so that we can repro on our side? Do you know it freezes at which part? session creation or inference? If you can provide the verbose log that will be good

rribes commented 5 months ago

Hello,

The code that I execute, so you can repro in your side is the following one:

sess = ort.InferenceSession('model.onnx', providers = ['TensorrtExecutionProvider','CUDAExecutionProvider'])

waveform, _ = librosa.core.load('audio.flac', sr = SAMPLE_RATE, mono = True)

waveform = waveform[None:]

print(f'--- Working with : {ort.get_device()} ---')

onnx_input = {sess.get_inputs()[0].name : waveform}

onnx_output = sess.run(None,onnx_input)

The screen freezes at the first line of code and the warnings and errors that shows it can be seen at the attached image.

warning_error

You can download the model at this link: https://wetransfer.com/downloads/e2e627e5a3743c42967858f266dfaf3e20240321094126/79b97d

Thank you for your help!

rribes commented 5 months ago

Hello @chilo-ms, do you have any update about the problem? Thanks