triton-inference-server / client

Triton Python, C++ and Java client libraries, and GRPC-generated client examples for go, java and scala.
BSD 3-Clause "New" or "Revised" License
551 stars 227 forks source link

Memory leak in SharedMemoryTensor.__dlpack__ #598

Open vodnikss opened 5 months ago

vodnikss commented 5 months ago

Hello, a memory leak was detected when executing this code. The code was run on Python 3.10., triton-client 2.41.1, torch 2.1.2.

import torch
import tritonclient.utils.cuda_shared_memory as cudashm

n1 = 1000
n2 = 1000
gpu_tensor = torch.ones([n1, n2]).cuda(0)
byte_size = 4 * n1 * n2
shm_handle = cudashm.create_shared_memory_region("cudashm_data", byte_size, 0)
while True:
    cudashm.set_shared_memory_region_from_dlpack(shm_handle, [gpu_tensor]
    smt = cudashm.as_shared_memory_tensor(shm_handle, "FP32", [n1, n2])
    generated_torch_tensor = torch.from_dlpack(smt)

The leak occurs when the dlpack function is called in torch.from_dlpack(smt)

ParshikovMM commented 5 months ago

Hello! I've encountered the same issue. It's possible that there's an error in the ctype component of the code when reducing the number of references to the object.

I managed to track down the issue with "immortal" objects using tracemalloc. When running similar code with a loop of 100,000 iterations, you can see the following top10 list:

[ Top 10 ]
/miniforge3/envs/env/lib/python3.11/ctypes/__init__.py:512: size=30.5 MiB, count=200001, average=160 B
/miniforge3/envs/env/lib/python3.11/site-packages/tritonclient/utils/_dlpack.py:141: size=19.1 MiB, count=200014, average=100 B
/miniforge3/envs/env/lib/python3.11/site-packages/tritonclient/utils/_dlpack.py:135: size=13.0 MiB, count=100019, average=136 B
/miniforge3/envs/env/lib/python3.11/site-packages/tritonclient/utils/_dlpack.py:140: size=13.0 MiB, count=100000, average=136 B
/miniforge3/envs/env/lib/python3.11/site-packages/tritonclient/utils/_dlpack.py:137: size=13.0 MiB, count=100000, average=136 B
/miniforge3/envs/env/lib/python3.11/site-packages/tritonclient/utils/_shared_memory_tensor.py:65: size=8597 KiB, count=200000, average=44 B
/miniforge3/envs/env/lib/python3.11/site-packages/tritonclient/utils/_shared_memory_tensor.py:75: size=6250 KiB, count=100000, average=64 B
/miniforge3/envs/env/lib/python3.11/site-packages/tritonclient/utils/_shared_memory_tensor.py:74: size=6250 KiB, count=99999, average=64 B
/miniforge3/envs/env/lib/python3.11/site-packages/torch/utils/dlpack.py:121: size=25.0 KiB, count=483, average=53 B
/miniforge3/envs/env/lib/python3.11/tracemalloc.py:505: size=1400 B, count=25, average=56 B

It's evident that there are objects whose quantity corresponds to the number of iterations.

The issue was resolved by removing the following line in _dlpack.py:

# Use as managed context in DLPack that doesn't hold ownership of the
# data content.
class DataViewContext:
    def __init__(self, shape) -> None:
        # Convert the Python object to ctypes objects expected by
        # DLPack
        self._shape = (ctypes.c_int64 * len(shape))(*shape)
        # No strides: compact and row-major
        self._strides = ctypes.POINTER(ctypes.c_int64)()

    def as_manager_ctx(self) -> ctypes.c_void_p:
        py_obj = ctypes.py_object(self)
        py_obj_ptr = ctypes.pointer(py_obj)
        ctypes.pythonapi.Py_IncRef(py_obj)
        # ctypes.pythonapi.Py_IncRef(ctypes.py_object(py_obj_ptr))  # problem line 
        return ctypes.cast(py_obj_ptr, ctypes.c_void_p)

As a result, we get the following top10 list:

[ Top 10 ]
/miniforge3/envs/env/lib/python3.11/site-packages/torch/utils/dlpack.py:121: size=25.2 KiB, count=486, average=53 B
/miniforge3/envs/env/lib/python3.11/ctypes/__init__.py:512: size=19.4 KiB, count=125, average=159 B
/miniforge3/envs/env/lib/python3.11/site-packages/tritonclient/utils/_dlpack.py:141: size=14.7 KiB, count=138, average=109 B
/miniforge3/envs/env/lib/python3.11/site-packages/tritonclient/utils/_dlpack.py:135: size=11.1 KiB, count=81, average=141 B
/miniforge3/envs/env/lib/python3.11/site-packages/tritonclient/utils/_dlpack.py:140: size=8432 B, count=62, average=136 B
/miniforge3/envs/env/lib/python3.11/site-packages/tritonclient/utils/_dlpack.py:137: size=8432 B, count=62, average=136 B
/miniforge3/envs/env/lib/python3.11/site-packages/tritonclient/utils/_shared_memory_tensor.py:65: size=5456 B, count=124, average=44 B
/miniforge3/envs/env/lib/python3.11/site-packages/tritonclient/utils/_shared_memory_tensor.py:75: size=3968 B, count=62, average=64 B
/miniforge3/envs/env/lib/python3.11/site-packages/tritonclient/utils/_shared_memory_tensor.py:74: size=3968 B, count=62, average=64 B
/miniforge3/envs/env/lib/python3.11/tracemalloc.py:505: size=1400 B, count=25, average=56 B