Open OvervCW opened 9 months ago
This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.
It is definitely still an issue we are dealing with.
This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.
Still an issue.
Describe the issue
I'm using NVIDIA Triton to perform inference on various detection models using the onnxruntime and this has always worked fine, but once I upgraded from version 1.13.1 of the onnxruntime to 1.16.0, I started occasionally getting errors like this:
The request buffer sizes are obviously completely wrong, so this seems like a bug in the onnxruntime or Triton, but I don't know enough about their interactions to be able to tell where exactly this is coming from.
I should note that it is not reliably reproducible. It doesn't happen for every inference request and restarting the inference server is sometimes enough to fix the problem, without changing anything about the models or version of Triton/onnxruntime.
To reproduce
I've only been able to reproduce this with 2 specific models so far, both CenterNet detection models.
Urgency
Medium. We upgraded to a new version of Triton/onnxruntime to fix minor issues with some other models, so we'd prefer to not have to downgrade.
Platform
Linux
OS Version
Ubuntu 22.04.3 LTS
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.16.0
ONNX Runtime API
C++
Architecture
X64
Execution Provider
CUDA
Execution Provider Library Version
CUDA 12.2.2