marcoslucianops / DeepStream-Yolo

NVIDIA DeepStream SDK 7.0 / 6.4 / 6.3 / 6.2 / 6.1.1 / 6.1 / 6.0.1 / 6.0 / 5.1 implementation for YOLO models
MIT License
1.38k stars 343 forks source link

CUDA shared memory registration failed when requesting recognition from deepstream to an external triton server. to occur #528

Open yoo-wonjun opened 2 months ago

yoo-wonjun commented 2 months ago

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) : GPU • DeepStream Version : 6.1 • JetPack Version (valid for Jetson only) • TensorRT Version : 8.4.0.11 • NVIDIA GPU Driver Version (valid for GPU only) : 525.105.17 • Issue Type( questions, new requirements, bugs) • How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)

The execution environment is as follows

triton server version is 23.10 Deepstream sends an inference request to the triton server docker run separately. For deepstream’s config.pbtxt, set enable_cuda_buffer_sharing:true When deepstream makes one inference request to one GPU, it executes normally. When multiple deep stream inference requests are made in deep stream, an error like number 5 occurs, but over time, it stabilizes and multiple deep streams run normally. ERROR: infer_grpc_client.cpp:223 Failed to register CUDA shared memory. ERROR: infer_grpc_client.cpp:311 Failed to set inference input: failed to register CUDA shared memory region ‘inbuf_0x2be8300’: failed to open CUDA IPC handle: invalid argument ERROR: infer_grpc_backend.cpp:140 gRPC backend run failed to create request for model: yolov8_pose ERROR: infer_trtis_backend.cpp:350 failed to specify dims when running inference on model:yolov8_pose, nvinfer error:NVDSINFER_TRITON_ERROR I want to prevent 5 errors when making multiple inference requests. • Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

I want to prevent 5 errors when making multiple inference requests. I can see from the document that enable_cuda_buffer_sharing:true is valid on the triton server in the deepstream docker container, but I confirmed that it operates normally over time even when running it on an external triton server. Please tell me how to prevent the above error from occurring.