Open AdnanEkici opened 3 months ago
Why 2 sessions? An inference session is stateless so can be called concurrently.
I would suspect resource contention between the sessions might be a problem, as each session has its own memory arenas and threadpools.
This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.
Describe the issue
Hello I am trying to deploy onnxruntime with TensorRT execution provider. When I deploy my session with GPU 0 everything works perfect (50 ms) but When I try to deploy same Yolo model with 2 sessions (subprocess arch) inference speed drastically slows down to 200 - 300 ms
To reproduce
-
Urgency
No response
Platform
Linux
OS Version
22.04
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.16
ONNX Runtime API
Python
Architecture
X64
Execution Provider
TensorRT
Execution Provider Library Version
TensorRT 8.6.1
Model File
No response
Is this a quantized model?
Unknown