I have a onnx model whose size is only 204.57MB,but when I create the session, gpu memory consumpation comes 1.16GB, when inferencing, the gpu memory consumpation comes to 2.25GB, this result in high inference cost, so how to reduce gpu memory consumption ?
To reproduce
just simply create onnxruntime session with default options. the gpu memory consumption function:
This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.
Describe the issue
I have a onnx model whose size is only 204.57MB,but when I create the session, gpu memory consumpation comes 1.16GB, when inferencing, the gpu memory consumpation comes to 2.25GB, this result in high inference cost, so how to reduce gpu memory consumption ?
To reproduce
just simply create onnxruntime session with default options. the gpu memory consumption function:
Urgency
No response
Platform
Linux
OS Version
ubuntu 20.04
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
onnxruntime-gpu 1.11.0
ONNX Runtime API
Python
Architecture
X64
Execution Provider
CUDA
Execution Provider Library Version
11.4
Model File
No response
Is this a quantized model?
No