With QNN execution provider we see that loading the first model ~800 MB of memory is allocated and after loading each model, ~100MB of memory is allocated again. When destroying the Ort::Session, the ~100MB memory for each model is freed up, but 700MB is left used(possibly used by QNN). Is it possible to destroy the QNN execution provider, thus freeing up that 700MB left from it somehow?
This is especially important in case we want to switch execution provider dynamically.
To reproduce
Load a model with QNN EP then destruct the Ort::Session of the model.
Describe the issue
With QNN execution provider we see that loading the first model ~800 MB of memory is allocated and after loading each model, ~100MB of memory is allocated again. When destroying the
Ort::Session
, the ~100MB memory for each model is freed up, but 700MB is left used(possibly used by QNN). Is it possible to destroy the QNN execution provider, thus freeing up that 700MB left from it somehow?This is especially important in case we want to switch execution provider dynamically.
To reproduce
Load a model with QNN EP then destruct the
Ort::Session
of the model.Urgency
TBD
Platform
Windows
OS Version
11
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
Latest
ONNX Runtime API
Python
Architecture
X64
Execution Provider
Other / Unknown
Execution Provider Library Version
Qnn 2.24
Model File
We are using quantized models.
Is this a quantized model?
Yes