openvinotoolkit / model_server

A scalable inference server for models optimized with OpenVINO™
https://docs.openvino.ai/2024/ovms_what_is_openvino_model_server.html
Apache License 2.0
653 stars 204 forks source link

Memory leaks on infer requests #2102

Open darkestpigeon opened 10 months ago

darkestpigeon commented 10 months ago

Describe the bug The server leaks memory on infer requests.

To Reproduce To reproduce

  1. Get the archive ov-leak-debug.tar.gz, it contains a load simulation script, docker-compose file to run the server, the models and the model_config.json, and a script that creates the models (for reference).
  2. Run the server with docker compose up -d.
  3. Install the dependencies pip install -r requirements.txt.
  4. Run the script python generate_ovms_load.py {model_name} --n-workers 10 --n-threads 10 with {model_name} being static, dynamic or dynamic-nms.
  5. Check OVMS memory usage to see it creep up. It fluctuates for static, creeps up slowly for dynamic, and grows rapidly for dynamic-nms. The memory usage doesn't go down even when the load is no longer applied.

Expected behavior The memory used by OVMS is constant (or stabilizes after some time).

Logs Doesn't apply, no explicit errors present.

Configuration

  1. OpenVINO Model Server 2023.1.d789fb785, OpenVINO backend 2023.1.0.12185.9e6b00e51cd
  2. Checked on 12th Gen Intel(R) Core(TM) i9-12900H and on AMD Ryzen 9 5950X 16-Core Processor
  3. Config and models included in the archive
dkalinowski commented 8 months ago

@darkestpigeon Thank you for your report. We have tested the models and scripts from attachments. This is 4 days (with some breaks) overview (docker container memory usage + RPS): workload_monitoring

How long was your testing workload?