triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.25k stars 1.47k forks source link

Client side memory leak in python sdk using shared memory #2274

Closed philipp-schmidt closed 3 years ago

philipp-schmidt commented 3 years ago

Description The python client SDK currently leaks memory when registering and unregistering shared memory (very often) in a single process. This is because the current implementation does not unmap the shared memory on destroy.

Triton Information version: nvcr.io/nvidia/tritonserver:20.08-py3

Are you using the Triton container or did you build it yourself? NGC

To Reproduce Repeat in a single process: triton_client.register_system_shared_memory() shm.set_shared_memory_region() triton_client.unregister_system_shared_memory() shm.destroy_shared_memory_region()

Expected behavior RAM usage of the process does not constantly increase.

Additional info The current client implementation only does shm_unlink: https://github.com/triton-inference-server/server/blob/8ea7cc316d5d5baf4f6367c7be298e93f4e9876f/src/clients/python/library/tritonclient/utils/shared_memory/shared_memory.cc#L130-L140

The server on the other hand properly unmaps the memory: https://github.com/triton-inference-server/server/blob/12f29e1d76d4d031b630f36ad0a27429d7f5716e/src/servers/shared_memory_manager.cc#L171-L182

According to this site, memory is only released if unmapped:

The shm_unlink function unlinks the shared-memory object. Memory objects are persistent, which means the contents remain until all references have been unmapped and the shared-memory object has been unlinked with a call to the shm_unlink function.

Probable quick n dirty fix (untested and coded right here and now):

int
SharedMemoryRegionDestroy(void* shm_handle)
{
  std::string shm_key =
      reinterpret_cast<SharedMemoryHandle*>(shm_handle)->shm_key_;
  int shm_fd = shm_unlink(shm_key.c_str());
  if (shm_fd == -1) {
    return -5;
  }
  int status = munmap(reinterpret_cast<SharedMemoryHandle*>(shm_handle)->base_addr_, reinterpret_cast<SharedMemoryHandle*>(shm_handle)->byte_size_);
  if (status == -1) {
    return status;
  }
  return 0;
}
CoderHam commented 3 years ago

Fixed by: https://github.com/triton-inference-server/server/pull/2279

philipp-schmidt commented 3 years ago

Perfect, thanks! In which pypi package version will this fix be included? 2.4.0 won't be changed right? 2.5.0? Asking because I could use this fix in production atm.

CoderHam commented 3 years ago

It will make it to 2.6.0 (20.12 release) We code froze for 20.11 a while ago. You can use the patch I shared to build a fixed version of the python library. In my understanding it should be backward compatible.

philipp-schmidt commented 3 years ago

Alright, will do. Thanks again for the fast fix as always.