microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.48k stars 2.9k forks source link

[Performance] C++ api: destroy the execution provider if the `Ort::Session` is destroyed #22511

Open kristoftunner opened 2 days ago

kristoftunner commented 2 days ago

Describe the issue

With QNN execution provider we see that loading the first model ~800 MB of memory is allocated and after loading each model, ~100MB of memory is allocated again. When destroying the Ort::Session, the ~100MB memory for each model is freed up, but 700MB is left used(possibly used by QNN). Is it possible to destroy the QNN execution provider, thus freeing up that 700MB left from it somehow?

This is especially important in case we want to switch execution provider dynamically.

To reproduce

Load a model with QNN EP then destruct the Ort::Session of the model.

Urgency

TBD

Platform

Windows

OS Version

11

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

Latest

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Other / Unknown

Execution Provider Library Version

Qnn 2.24

Model File

We are using quantized models.

Is this a quantized model?

Yes

HectorSVC commented 4 hours ago

Could you also try to load the model using CPU EP after destruct the Ort::Session with QNN EP? Maybe it's memory hold by memory pool.