Open hy846130226 opened 8 months ago
If I disable the trt_cuda_graph_enable, I could get every image correct results.
Make sure you use IO/Binding to bind input tensors in GPU memory. During inference, copy input to same address (input shape shall be the same) of the input used in the first inference run.
You can get some idea from corresponding python code: https://github.com/microsoft/onnxruntime/blob/4a196d15940b0f328735c888e2e861d67602ffcf/onnxruntime/python/tools/transformers/io_binding_helper.py#L212-L307
Hi @tianleiwu
Thanks for your help!
I could use the cudaGraphic in tensorrt but I am confused in onnx-tensorrt.
I know I'm supposed to copy the input to the same address, but shouldn't this operation be automated by calling this method?
std::vector
But it seems like I should use some methods to get the IOBinding, then every times I infer the image, I should changed the IOBinding to binding the address, even my address always the same. (every time I got the image data, I will copy it to input address, in other words, my address is reusable)
Author
And by the way, How could I got the IOBinding in onnx-tensorrt in C++.
For cuda graph, you shall only create IO Binding once. For the first call, the cuda graph will be captured. For the remaining calls, you only need copy data to same address and call run with io binding API to replay the captured graph.
An example of I/O binding for TRT in C++ is here: https://github.com/microsoft/onnxruntime/blob/4a196d15940b0f328735c888e2e861d67602ffcf/onnxruntime/test/shared_lib/test_inference.cc#L1897-L1909
Example of cuda graph here: https://github.com/microsoft/onnxruntime/blob/4a196d15940b0f328735c888e2e861d67602ffcf/onnxruntime/test/shared_lib/test_inference.cc#L1975-L1986
Hi @tianleiwu
Thanks for your help!
I modify the code according the example but it does not work.
do I miss something?
@hy846130226, please bind inputs and outputs to buffers in GPU memory instead of CPU memory.
This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.
Describe the issue
I'm using onnx-tensorrt.
When I enable the trt_cuda_graph_enable like this:
Subsequently, no matter how many images of data I pass for inference, what I get is always the result of the first image.
The following is my infer code: ![Uploading image.png…]()
the “input” and “output temp” is reusable.
To reproduce
Urgency
No response
Platform
Windows
OS Version
WIN10
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.17.0
ONNX Runtime API
C++
Architecture
X64
Execution Provider
TensorRT
Execution Provider Library Version
No response