microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.08k stars 2.84k forks source link

Do I need to convert data to device for TensorRTExecutionProvider? #13952

Open davodogster opened 1 year ago

davodogster commented 1 year ago

Describe the issue

I trained a pytorch lightning model 'arch':'deeplabv3plus', 'encoder_name':"tu-resnetrs50" in SMP. Converted it to ONNX, ran shape inference. I am able to run inference in TRT (data on CPU), But it is only about 30% faster than raw pytorch GPU inference.

The Python API Docs only discuss converting to device for CUDA+CPU Execution provider but nothing for TRT. Can we convert to cuda device for TRTExecution? Will it speed up inference> How is it done?

Cheers, Sam

To reproduce

![Uploading image.png…]()

Urgency

moderate

Platform

Linux

OS Version

Ubuntu 20.04

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.13.1

ONNX Runtime API

Python

Architecture

X64

Execution Provider

TensorRT

Execution Provider Library Version

CUDA 11.6 TRT 8.4.1.5

faxu commented 1 year ago

What do you mean by "convert data to device"? TRT EP must be registered to be used, see https://onnxruntime.ai/docs/execution-providers/TensorRT-ExecutionProvider.html#python

davodogster commented 1 year ago

@faxu Yes I know about registering TRT EP. I mean to("cuda"). DEVICE = "cuda", thanks

davodogster commented 1 year ago

What do you mean by "convert data to device"? TRT EP must be registered to be used, see https://onnxruntime.ai/docs/execution-providers/TensorRT-ExecutionProvider.html#python

image

Can we convert data to cuda before passing to TRT EP ? If we can, how do we do it? Thanks

davodogster commented 1 year ago

@faxu

Here is what Im trying to do:

image

image

image

Thanks, Sam

hariharans29 commented 1 year ago

There are several parts to your question:

1) The error you see above is because you are trying to bind None as an output and this is not a valid usage of the API. You would have to use the right output name and optionally provide a buffer that will be used as the buffer where the output will be written to. See all illustrative examples in the IOBinding doc you referenced above.

2) I think your actual question is : "Can we use IOBinding for the TensorRT EP and will that provide a copy savings when compared to feeding in data from the CPU for a TensorRT EP backed session ?". For this, I am tagging the TensorRT EP folks - @jywu-msft @stevenlix

davodogster commented 1 year ago

@hariharans29 Ok do you know how I get the output name? And yes to #2! That is my question. Thanks, Sam

hariharans29 commented 1 year ago

You can use the tool "Netron" to load an ONNX model graph and visually inspect it.

If you want to use the Python API, please check this - https://github.com/microsoft/onnxruntime/blob/61e7636e618514c1d59c6d46bf7ccaa3df7fda88/onnxruntime/test/python/onnxruntime_test_python.py#L397