triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.12k stars 1.46k forks source link

Python backend SHM memory leak #7481

Open mbahri opened 2 months ago

mbahri commented 2 months ago

Description I am encountering two possibly related issues with the Python backend and shared memory:

  1. During operation, the shared memory usage keeps growing, leading to errors. It looks like the shared memory regions allocated by the Python backend for its inputs are not recycled. I understand the SHM region grows based on the size of the inputs, but this is an issue especially when multiple model instances are running. Also, it is possible the region grows beyond the largest input if memory is leaked instead of re-used.
  2. After the Triton container is terminated, allocated shared memory regions remain in /dev/shm

Triton Information What version of Triton are you using? 2.47.0

Are you using the Triton container or did you build it yourself? Official containers:

To Reproduce I encountered the issue with any Python-based model I tried:

Expected behavior

  1. Shm regions would be shrunk, or at least wouldn't grow indefinitely (arena-style allocator?)
  2. Shm regions would be de-allocated when the model shuts down
rmccorm4 commented 2 months ago

Hi @mbahri,

Do you have a minimal model, client, and steps you could share for reproducing to help expedite debugging? If it is a generic python backend shm issue, then a simple python model not doing anything interesting (identity, etc.) may be able to reproduce it.

CC @Tabrizian @kthui @krishung5 for viz

rodrigo-orlandini commented 1 month ago

Hi everyone,

@mbahri, has it already solved? If it has, you could provide an explanation about solution?

I'm facing a similar problem here. We've already wrote a github issue pointing this problem and a ticket was opened, but it is still occurring and we don't have any solution.

@rmccorm4, there are some steps and metrics that could be used to reproduce and analyse the problem. You could check it there: https://github.com/triton-inference-server/server/issues/6720

fangpings commented 3 weeks ago

We are facing the same issues in our models. Any more updates on this?

Also for the second issue where /dev/shm will not be cleaned after container restarts. If you are in k8s environment, we've used a hacky way to clean it once the container restarted so at least container won't end up CLBO because it has no memory available

                  "lifecycle": {
                     "postStart": {
                        "exec": {
                           "command": ["/bin/sh", "-c", "rm -f /dev/shm/*"]
                        }
                     }
                  },
ash2703 commented 3 hours ago

Facing a similar issue when deploying on k8s SHM grows and pod is killed with OOM

Do not encounter this when testing without k8s