[Performance]: how to release compiled model or infer request's memory under python

openvinotoolkit / openvino

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference

Apache License 2.0

6.35k stars 2.09k forks source link

OpenVINO Version

2024.1

Operating System

Ubuntu 22.04 (LTS)

Device used for inference

CPU

OpenVINO installation

PyPi

Programming Language

Python

Hardware Architecture

x86 (64 bits)

Model used

geomvsnet

Model quantization

Target Platform

No response

Performance issue description

Our application(which consists of several models)need lost of runtime memory, more than 10G, but our target's hw only has 8G memory. After one of the models inferences once, the memory usage increases by 5G. So I try to release the model's infer request memory by using "del infer_request", but can not release it. I would like to know how to manually release the memory footprint generated by model inference during cyclic inference using python.

openvinotoolkit / openvino