Open woaixiaoxiao opened 2 months ago
One interesting thing is that only one function can be tested at a time; otherwise, the test results may be inaccurate due to memory used by the previous function not being released in time.
InferenceSession::Run is stateless and can be called concurrently. Given that, do you need multiple sessions with the same model?
The settings to use bytes directly require an ORT format model. See https://onnxruntime.ai/docs/performance/model-optimizations/ort-format-models.html#convert-onnx-models-to-ort-format
This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.
Describe the issue
I now want multiple threads to load the same model and perform inference in a data-parallel manner. To reduce memory usage, I want to avoid having each session individually read the ONNX file from disk into memory. The approach I am currently taking is to first read the ONNX file into memory and then use CreateSessionFromArray to create sessions. I referred to this issue: https://github.com/microsoft/onnxruntime/issues/8328 during this process. However, it doesn't seem to be working as expected; CreateSessionFromArray does not save memory usage.
To reproduce
you can use the python script to get the onnx file, and the use c++ code to run。
Urgency
No response
Platform
Linux
OS Version
centos8
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
onnxruntime-linux-x64-gpu-1.19.0
ONNX Runtime API
C++
Architecture
X64
Execution Provider
Default CPU
Execution Provider Library Version
No response