triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.25k stars 1.47k forks source link

Failed to open the cudaIpcHandle when I call an ONNX / TRT backend from Python backend #5483

Closed BasharMahasen closed 1 year ago

BasharMahasen commented 1 year ago

Hi, I am getting the below error message when I call an ONNX / TRT backend from a Python backend. "Failed to open the cudaIpcHandle error. . error: invalid argument".

Description I am receiving the above message whenever I try to to call an ONNX model from a Triton Python Backend. The server logs show that the ONNX model is being called and executed. However, it failed to returns back the results to Python backend.

The same behaviour happens for ONNX and TRT backends. I was able to test direct inference against the ONNX/ TRT model. However, my scenario requires a front end python to handle the pre/post processing tasks.

Triton Information What version of Triton are you using? I am facing this issue on 23.02-py3 and 23.01-py3

Are you using the Triton container or did you build it yourself? I am using Triton Container.

To Reproduce

Here are the models definitions, the native client logs and the server logs

name: "py_main_model" backend: "python" max_batch_size:10

input [ { name: "image_name" data_type: TYPE_STRING dims: [-1] } ]

input [ { name: "image_bytes_b46" data_type: TYPE_STRING dims: [-1] } ]

output [ { name: "result", data_type: TYPE_FP32, dims: [ -1,1]

} ]

instance_group: [ { name: "py_main_model", kind: KIND_CPU, count: 1 } ]


import json import cv2 from cv2.dnn import blobFromImage import numpy as np import triton_python_backend_utils as pb_utils import time import base64

class TritonPythonModel:

def initialize(self, args):
    # You must parse model_config. JSON string is not parsed here
    self.model_config = model_config = json.loads(args['model_config'])
    self.model_std = 128.0
    self.model_mean = 127.5
    self.model_height = 640
    self.model_width = 640
    self.detection_model_name = "onnx_scrfd_10g_bnkps_detection_model"
    self.logger = pb_utils.Logger

def execute(self, requests):
    responses = []
    for request in requests:

        input0_tensor = pb_utils.get_input_tensor_by_name(request, "image_name")
        input1_tensor = pb_utils.get_input_tensor_by_name(request, "image_bytes_b46")

        input0_data = input0_tensor.as_numpy()[0][0]
        image_name = input0_data.decode("utf-8")
        input1_data = input1_tensor.as_numpy()[0][0]
        img_jpg_bytes_b64 = input1_data.decode("utf-8")
        img_jpg_str = base64.b64decode(img_jpg_bytes_b64)

        image_data = cv2.imdecode(np.frombuffer(img_jpg_str, np.uint8), cv2.IMREAD_COLOR)  # as numpy array
        processed_img, resize_scale = self.preprocess_for_detection(img=image_data)
        img_blob = self.convert_to_blob(processed_img)

        # # create an input tensor for the detection model
        input_tensor = pb_utils.Tensor("input.1", np.random.random((1,3,640,640)).astype("float32"))
        infer_request = pb_utils.InferenceRequest(
            request_id=str(1),
            model_name=self.detection_model_name,
            requested_output_names=self.detection_model_outputs_name,
            inputs=[input_tensor])

        infer_response = infer_request.exec()
        if infer_response.has_error():
            raise pb_utils.TritonModelException(
                infer_response.error().message())
        inference_response = pb_utils.InferenceResponse(output_tensors=infer_response.output_tensors())
        responses.append(inference_response)
        self.logger.log_info(f"Exec,{time.perf_counter() - t1:.3f},Prepare,{t1 - t0:.3f}")
        ...................
        #convert responses ....
        ...................

    return processed_responses

...................


name: "onnx_scrfd_10g_bnkps_detection_model" platform: "onnxruntime_onnx" max_batch_size : 10

input [ { name: "input.1" data_type: TYPE_FP32 dims: [3, -1, -1] } ] output [ { name: "score_8", data_type: TYPE_FP32, dims: [ -1,1]

} ]

output [ { name: "bbox_8", data_type: TYPE_FP32, dims: [-1, 4]

}]

output [ { name: "kps_8", data_type: TYPE_FP32, dims: [ -1,10]

} ]

output [ { name: "score_16", data_type: TYPE_FP32, dims: [ -1,1]

} ]

output [ { name: "bbox_16", data_type: TYPE_FP32, dims: [-1, 4]

} ]

output [ { name: "kps_16", data_type: TYPE_FP32, dims: [ -1,10]

} ]

output [ { name: "score_32", data_type: TYPE_FP32, dims: [ -1,1]

} ]

output [ { name: "bbox_32", data_type: TYPE_FP32, dims: [ -1,4]

} ]

output [ { name: "kps_32", data_type: TYPE_FP32, dims:[-1,10]

} ]

instance_group: [ { name: "onnx_scrfd_10g_bnkps_detection_model", kind: KIND_GPU, count: 1

    }

]

Client Logs

Traceback (most recent call last): File "C:\projects\FRApp-Triton\Python\client\client_triton_native.py", line 47, in response = client.infer(model_name, File "C:\ProgramData\Anaconda3\envs\fr_onnx_env\lib\site-packages\tritonclient\grpc__init__.py", line 1446, in infer raise_error_grpc(rpc_error) File "C:\ProgramData\Anaconda3\envs\fr_onnx_env\lib\site-packages\tritonclient\grpc__init__.py", line 76, in raise_error_grpc raise get_error_grpc(rpc_error) from None tritonclient.utils.InferenceServerException: [StatusCode.INTERNAL] Failed to process the request(s) for model instance 'fr_main_model', message: TritonModelException: Failed to open the cudaIpcHandle. error: invalid argument

At: /models/fr_main_model/1/model.py(86): execute

Server Logs

I0310 03:49:01.579260 1 model_lifecycle.cc:428] AsyncLoad() 'fr_main_model' I0310 03:49:01.588920 1 model_lifecycle.cc:459] loading: fr_main_model:1 I0310 03:49:01.589062 1 model_lifecycle.cc:509] CreateModel() 'fr_main_model' version 1 I0310 03:49:01.593124 1 backend_model.cc:348] Adding default backend config setting: default-max-batch-size,4 I0310 03:49:01.593206 1 python_be.cc:1814] TRITONBACKEND_ModelInitialize: fr_main_model (version 1) I0310 03:49:01.606389 1 stub_launcher.cc:251] Starting Python backend stub: exec /opt/tritonserver/backends/python/triton_python_backend_stub /models/fr_main_model/1/model.py triton_python_backend_shm_region_13 67108864 67108864 1 /opt/tritonserver/backends/python 336 fr_main_model I0310 03:49:03.199017 1 python_be.cc:1594] model configuration: { "name": "fr_main_model", "platform": "", "backend": "python", "version_policy": { "latest": { "num_versions": 1 } }, "max_batch_size": 10, "input": [ { "name": "image_name", "data_type": "TYPE_STRING", "format": "FORMAT_NONE", "dims": [ -1 ], "is_shape_tensor": false, "allow_ragged_batch": false, "optional": false }, { "name": "image_bytes_b46", "data_type": "TYPE_STRING", "format": "FORMAT_NONE", "dims": [ -1 ], "is_shape_tensor": false, "allow_ragged_batch": false, "optional": false } ], "output": [ { "name": "result", "data_type": "TYPE_FP32", "dims": [ -1, 1 ], "label_filename": "", "is_shape_tensor": false } ], "batch_input": [], "batch_output": [], "optimization": { "priority": "PRIORITY_DEFAULT", "input_pinned_memory": { "enable": true }, "output_pinned_memory": { "enable": true }, "gather_kernel_buffer_threshold": 0, "eager_batching": false }, "instance_group": [ { "name": "fr_main_model", "kind": "KIND_CPU", "count": 1, "gpus": [], "secondary_devices": [], "profile": [], "passive": false, "host_policy": "" } ], "default_model_filename": "model.py", "cc_model_filenames": {}, "metric_tags": {}, "parameters": {}, "model_warmup": [] } I0310 03:49:03.204836 1 python_be.cc:1858] TRITONBACKEND_ModelInstanceInitialize: fr_main_model (CPU device 0) I0310 03:49:03.204888 1 backend_model_instance.cc:68] Creating instance fr_main_model on CPU using artifact 'model.py' I0310 03:49:03.219671 1 stub_launcher.cc:251] Starting Python backend stub: exec /opt/tritonserver/backends/python/triton_python_backend_stub /models/fr_main_model/1/model.py triton_python_backend_shm_region_14 67108864 67108864 1 /opt/tritonserver/backends/python 336 fr_main_model I0310 03:49:03.559618 1 model.py:53] FR Python Model is Loaded !! I0310 03:49:03.562520 1 python_be.cc:1879] TRITONBACKEND_ModelInstanceInitialize: instance initialization successful fr_main_model (device 0) I0310 03:49:03.562663 1 backend_model_instance.cc:766] Starting backend thread for fr_main_model at nice 0 on device 0... I0310 03:49:03.563025 1 model_lifecycle.cc:694] successfully loaded 'fr_main_model' version 1 I0310 03:49:03.563099 1 backend_model_instance.cc:789] Stopping backend thread for fr_main_model... I0310 03:49:03.563114 1 model_lifecycle.cc:285] VersionStates() 'fr_main_model' I0310 03:49:03.563225 1 python_be.cc:1998] TRITONBACKEND_ModelInstanceFinalize: delete instance state <class 'c_python_backend_utils.Logger'> Cleaning up... I0310 03:49:04.776534 1 python_be.cc:1837] TRITONBACKEND_ModelFinalize: delete model state I0310 03:49:04.776595 1 model_lifecycle.cc:577] OnDestroy callback() 'fr_main_model' version 1 I0310 03:49:04.776601 1 model_lifecycle.cc:579] successfully unloaded 'fr_main_model' version 1 I0310 03:49:18.563314 1 server.cc:334] Polling model repository I0310 03:49:33.706748 1 server.cc:334] Polling model repository I0310 03:49:46.116211 1 grpc_server.cc:3848] Process for ModelInferHandler, rpc_ok=1, 0 step START I0310 03:49:46.116262 1 grpc_server.cc:3841] New request handler for ModelInferHandler, 0 I0310 03:49:46.116272 1 model_lifecycle.cc:327] GetModel() 'fr_main_model' version -1 I0310 03:49:46.116280 1 model_lifecycle.cc:327] GetModel() 'fr_main_model' version -1 I0310 03:49:46.116303 1 infer_request.cc:729] [request id: 101] prepared: [0x0x7f8634004b00] request id: 101, model: fr_main_model, requested version: -1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 1, priority: 0, timeout (us): 0 original inputs: [0x0x7f8634005908] input: image_bytes_b46, type: BYTES, original shape: [1,1], batch + shape: [1,1], shape: [1] [0x0x7f863400a158] input: image_name, type: BYTES, original shape: [1,1], batch + shape: [1,1], shape: [1] override inputs: inputs: [0x0x7f863400a158] input: image_name, type: BYTES, original shape: [1,1], batch + shape: [1,1], shape: [1] [0x0x7f8634005908] input: image_bytes_b46, type: BYTES, original shape: [1,1], batch + shape: [1,1], shape: [1] original requested outputs: result requested outputs: result

I0310 03:49:46.116410 1 python_be.cc:1094] model fr_main_model, instance fr_main_model, executing 1 requests I0310 03:49:46.170530 1 model_lifecycle.cc:327] GetModel() 'onnx_scrfd_10g_bnkps_detection_model' version -1 I0310 03:49:46.175708 1 model_lifecycle.cc:327] GetModel() 'onnx_scrfd_10g_bnkps_detection_model' version -1 I0310 03:49:46.175752 1 model_lifecycle.cc:327] GetModel() 'onnx_scrfd_10g_bnkps_detection_model' version -1 I0310 03:49:46.176831 1 infer_request.cc:729] [request id: 1] prepared: [0x0x7f86d4001f50] request id: 1, model: onnx_scrfd_10g_bnkps_detection_model, requested version: -1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 1, priority: 0, timeout (us): 0 original inputs: [0x0x7f86d4002218] input: input.1, type: FP32, original shape: [1,3,640,640], batch + shape: [1,3,640,640], shape: [3,640,640] override inputs: inputs: [0x0x7f86d4002218] input: input.1, type: FP32, original shape: [1,3,640,640], batch + shape: [1,3,640,640], shape: [3,640,640] original requested outputs: bbox_16 bbox_32 bbox_8 kps_16 kps_32 kps_8 score_16 score_32 score_8 requested outputs: bbox_16 bbox_32 bbox_8 kps_16 kps_32 kps_8 score_16 score_32 score_8

I0310 03:49:46.179897 1 onnxruntime.cc:2672] model onnx_scrfd_10g_bnkps_detection_model, instance onnx_scrfd_10g_bnkps_detection_model_0, executing 1 requests I0310 03:49:46.179935 1 onnxruntime.cc:1469] TRITONBACKEND_ModelExecute: Running onnx_scrfd_10g_bnkps_detection_model_0 with 1 requests 2023-03-10 03:49:46.272369552 [I:onnxruntime:log, bfc_arena.cc:26 BFCArena] Creating BFCArena for Cuda with following configs: initial_chunk_size_bytes: 1048576 max_dead_bytes_per_chunk: 134217728 initial_growth_chunk_size_bytes: 2097152 memory limit: 18446744073709551615 arena_extend_strategy: 0 2023-03-10 03:49:46.275071883 [V:onnxruntime:log, bfc_arena.cc:62 BFCArena] Creating 21 bins of max chunk size 256 to 268435456 2023-03-10 03:49:46.275389987 [I:onnxruntime:log, bfc_arena.cc:317 AllocateRawInternal] Extending BFCArena for Cuda. bin_num:14 (requested) num_bytes: 4915200 (actual) rounded_bytes:4915200 2023-03-10 03:49:46.276985905 [I:onnxruntime:log, bfc_arena.cc:197 Extend] Extended allocation by 8388608 bytes. 2023-03-10 03:49:46.277021006 [I:onnxruntime:log, bfc_arena.cc:200 Extend] Total allocated bytes: 8388608 2023-03-10 03:49:46.277028406 [I:onnxruntime:log, bfc_arena.cc:203 Extend] Allocated memory at 0xb53c00000 to 0xb54400000 2023-03-10 03:49:46.290202358 [I:onnxruntime:, sequential_executor.cc:176 Execute] Begin execution 2023-03-10 03:49:46.293022590 [I:onnxruntime:log, bfc_arena.cc:317 AllocateRawInternal] Extending BFCArena for Cuda. bin_num:15 (requested) num_bytes: 11468800 (actual) rounded_bytes:11468800 2023-03-10 03:49:46.293763699 [I:onnxruntime:log, bfc_arena.cc:197 Extend] Extended allocation by 16777216 bytes. 2023-03-10 03:49:46.293805399 [I:onnxruntime:log, bfc_arena.cc:200 Extend] Total allocated bytes: 25165824 2023-03-10 03:49:46.293815900 [I:onnxruntime:log, bfc_arena.cc:203 Extend] Allocated memory at 0xb54400000 to 0xb55400000 2023-03-10 03:49:46.334326567 [I:onnxruntime:log, bfc_arena.cc:267 Reserve] Reserving memory in BFCArena for Cuda size: 33554432 2023-03-10 03:49:48.645325223 [I:onnxruntime:log, bfc_arena.cc:317 AllocateRawInternal] Extending BFCArena for Cuda. bin_num:11 (requested) num_bytes: 618624 (actual) rounded_bytes:618752 2023-03-10 03:49:48.645912229 [I:onnxruntime:log, bfc_arena.cc:197 Extend] Extended allocation by 16777216 bytes. 2023-03-10 03:49:48.645953630 [I:onnxruntime:log, bfc_arena.cc:200 Extend] Total allocated bytes: 41943040 2023-03-10 03:49:48.645964130 [I:onnxruntime:log, bfc_arena.cc:203 Extend] Allocated memory at 0xb55400000 to 0xb56400000 2023-03-10 03:49:48.648471459 [I:onnxruntime:log, bfc_arena.cc:267 Reserve] Reserving memory in BFCArena for Cuda size: 33554432 2023-03-10 03:49:48.709687565 [I:onnxruntime:log, bfc_arena.cc:317 AllocateRawInternal] Extending BFCArena for Cuda. bin_num:11 (requested) num_bytes: 643840 (actual) rounded_bytes:643840 2023-03-10 03:49:48.710296172 [I:onnxruntime:log, bfc_arena.cc:197 Extend] Extended allocation by 33554432 bytes. 2023-03-10 03:49:48.710336072 [I:onnxruntime:log, bfc_arena.cc:200 Extend] Total allocated bytes: 75497472 2023-03-10 03:49:48.710346673 [I:onnxruntime:log, bfc_arena.cc:203 Extend] Allocated memory at 0xb66200000 to 0xb68200000 2023-03-10 03:49:48.711128782 [I:onnxruntime:log, bfc_arena.cc:267 Reserve] Reserving memory in BFCArena for Cuda size: 33554432 2023-03-10 03:49:48.809422715 [I:onnxruntime:log, bfc_arena.cc:267 Reserve] Reserving memory in BFCArena for Cuda size: 33554432 I0310 03:49:48.856349 1 server.cc:334] Polling model repository 2023-03-10 03:49:48.871865836 [I:onnxruntime:log, bfc_arena.cc:267 Reserve] Reserving memory in BFCArena for Cuda size: 33554432 2023-03-10 03:49:48.926574867 [I:onnxruntime:log, bfc_arena.cc:267 Reserve] Reserving memory in BFCArena for Cuda size: 33554432 2023-03-10 03:49:48.974499519 [I:onnxruntime:log, bfc_arena.cc:267 Reserve] Reserving memory in BFCArena for Cuda size: 33554432 2023-03-10 03:49:48.981579901 [I:onnxruntime:log, bfc_arena.cc:267 Reserve] Reserving memory in BFCArena for Cuda size: 33554432 2023-03-10 03:49:48.988586882 [I:onnxruntime:log, bfc_arena.cc:267 Reserve] Reserving memory in BFCArena for Cuda size: 33554432 2023-03-10 03:49:49.001684933 [I:onnxruntime:log, bfc_arena.cc:267 Reserve] Reserving memory in BFCArena for Cuda size: 33554432 2023-03-10 03:49:49.009385222 [I:onnxruntime:log, bfc_arena.cc:267 Reserve] Reserving memory in BFCArena for Cuda size: 33554432 2023-03-10 03:49:49.015513693 [I:onnxruntime:log, bfc_arena.cc:267 Reserve] Reserving memory in BFCArena for Cuda size: 33554432 2023-03-10 03:49:49.025537008 [I:onnxruntime:log, bfc_arena.cc:267 Reserve] Reserving memory in BFCArena for Cuda size: 33554432 2023-03-10 03:49:49.030749168 [I:onnxruntime:log, bfc_arena.cc:267 Reserve] Reserving memory in BFCArena for Cuda size: 33554432 2023-03-10 03:49:49.035117219 [I:onnxruntime:log, bfc_arena.cc:267 Reserve] Reserving memory in BFCArena for Cuda size: 33554432 2023-03-10 03:49:49.039637271 [I:onnxruntime:log, bfc_arena.cc:267 Reserve] Reserving memory in BFCArena for Cuda size: 33554432 2023-03-10 03:49:49.044339925 [I:onnxruntime:log, bfc_arena.cc:267 Reserve] Reserving memory in BFCArena for Cuda size: 33554432 2023-03-10 03:49:49.048902978 [I:onnxruntime:log, bfc_arena.cc:267 Reserve] Reserving memory in BFCArena for Cuda size: 33554432 2023-03-10 03:49:49.053202027 [I:onnxruntime:log, bfc_arena.cc:267 Reserve] Reserving memory in BFCArena for Cuda size: 33554432 2023-03-10 03:49:49.058668390 [I:onnxruntime:log, bfc_arena.cc:267 Reserve] Reserving memory in BFCArena for Cuda size: 33554432 2023-03-10 03:49:49.062754437 [I:onnxruntime:log, bfc_arena.cc:267 Reserve] Reserving memory in BFCArena for Cuda size: 33554432 2023-03-10 03:49:49.067573093 [I:onnxruntime:log, bfc_arena.cc:267 Reserve] Reserving memory in BFCArena for Cuda size: 33554432 2023-03-10 03:49:49.073085757 [I:onnxruntime:log, bfc_arena.cc:267 Reserve] Reserving memory in BFCArena for Cuda size: 33554432 2023-03-10 03:49:49.077894612 [I:onnxruntime:log, bfc_arena.cc:267 Reserve] Reserving memory in BFCArena for Cuda size: 33554432 2023-03-10 03:49:49.081744956 [I:onnxruntime:log, bfc_arena.cc:267 Reserve] Reserving memory in BFCArena for Cuda size: 33554432 2023-03-10 03:49:49.086769114 [I:onnxruntime:log, bfc_arena.cc:267 Reserve] Reserving memory in BFCArena for Cuda size: 33554432 2023-03-10 03:49:49.090807361 [I:onnxruntime:log, bfc_arena.cc:267 Reserve] Reserving memory in BFCArena for Cuda size: 33554432 2023-03-10 03:49:49.096415126 [I:onnxruntime:log, bfc_arena.cc:267 Reserve] Reserving memory in BFCArena for Cuda size: 33554432 2023-03-10 03:49:49.101409283 [I:onnxruntime:log, bfc_arena.cc:267 Reserve] Reserving memory in BFCArena for Cuda size: 33554432 2023-03-10 03:49:49.107921458 [I:onnxruntime:log, bfc_arena.cc:267 Reserve] Reserving memory in BFCArena for Cuda size: 33554432 2023-03-10 03:49:49.112702114 [I:onnxruntime:log, bfc_arena.cc:267 Reserve] Reserving memory in BFCArena for Cuda size: 33554432 2023-03-10 03:49:49.116719060 [I:onnxruntime:log, bfc_arena.cc:317 AllocateRawInternal] Extending BFCArena for CUDA_CPU. bin_num:0 (requested) num_bytes: 32 (actual) rounded_bytes:256 2023-03-10 03:49:49.117431268 [I:onnxruntime:log, bfc_arena.cc:197 Extend] Extended allocation by 1048576 bytes. 2023-03-10 03:49:49.117466768 [I:onnxruntime:log, bfc_arena.cc:200 Extend] Total allocated bytes: 1048576 2023-03-10 03:49:49.117474269 [I:onnxruntime:log, bfc_arena.cc:203 Extend] Allocated memory at 0x7f85ea4f11a0 to 0x7f85ea5f11a0 2023-03-10 03:49:49.118298678 [I:onnxruntime:log, bfc_arena.cc:267 Reserve] Reserving memory in BFCArena for Cuda size: 33554432 2023-03-10 03:49:49.126844577 [I:onnxruntime:log, bfc_arena.cc:267 Reserve] Reserving memory in BFCArena for Cuda size: 33554432 2023-03-10 03:49:49.132116837 [I:onnxruntime:log, bfc_arena.cc:267 Reserve] Reserving memory in BFCArena for Cuda size: 33554432 2023-03-10 03:49:49.136446887 [I:onnxruntime:log, bfc_arena.cc:267 Reserve] Reserving memory in BFCArena for Cuda size: 33554432 2023-03-10 03:49:49.140996940 [I:onnxruntime:log, bfc_arena.cc:267 Reserve] Reserving memory in BFCArena for Cuda size: 33554432 2023-03-10 03:49:49.145188588 [I:onnxruntime:log, bfc_arena.cc:267 Reserve] Reserving memory in BFCArena for Cuda size: 33554432 2023-03-10 03:49:49.148632128 [I:onnxruntime:log, bfc_arena.cc:267 Reserve] Reserving memory in BFCArena for Cuda size: 33554432 2023-03-10 03:49:49.156911923 [I:onnxruntime:log, bfc_arena.cc:267 Reserve] Reserving memory in BFCArena for Cuda size: 33554432 2023-03-10 03:49:49.160916970 [I:onnxruntime:log, bfc_arena.cc:267 Reserve] Reserving memory in BFCArena for Cuda size: 33554432 2023-03-10 03:49:49.164905516 [I:onnxruntime:log, bfc_arena.cc:267 Reserve] Reserving memory in BFCArena for Cuda size: 33554432 2023-03-10 03:49:49.169633570 [I:onnxruntime:log, bfc_arena.cc:267 Reserve] Reserving memory in BFCArena for Cuda size: 33554432 2023-03-10 03:49:49.174097622 [I:onnxruntime:log, bfc_arena.cc:267 Reserve] Reserving memory in BFCArena for Cuda size: 33554432 2023-03-10 03:49:49.181343305 [I:onnxruntime:log, bfc_arena.cc:267 Reserve] Reserving memory in BFCArena for Cuda size: 33554432 2023-03-10 03:49:49.188075083 [I:onnxruntime:log, bfc_arena.cc:267 Reserve] Reserving memory in BFCArena for Cuda size: 33554432 2023-03-10 03:49:49.195068364 [I:onnxruntime:log, bfc_arena.cc:267 Reserve] Reserving memory in BFCArena for Cuda size: 33554432 2023-03-10 03:49:49.199037509 [I:onnxruntime:log, bfc_arena.cc:267 Reserve] Reserving memory in BFCArena for Cuda size: 33554432 2023-03-10 03:49:49.205609585 [I:onnxruntime:log, bfc_arena.cc:267 Reserve] Reserving memory in BFCArena for Cuda size: 33554432 2023-03-10 03:49:49.210187938 [I:onnxruntime:log, bfc_arena.cc:267 Reserve] Reserving memory in BFCArena for Cuda size: 33554432 2023-03-10 03:49:49.215340897 [I:onnxruntime:log, bfc_arena.cc:267 Reserve] Reserving memory in BFCArena for Cuda size: 33554432 2023-03-10 03:49:49.220104052 [I:onnxruntime:log, bfc_arena.cc:267 Reserve] Reserving memory in BFCArena for Cuda size: 33554432 2023-03-10 03:49:49.224429302 [I:onnxruntime:log, bfc_arena.cc:267 Reserve] Reserving memory in BFCArena for Cuda size: 33554432 2023-03-10 03:49:49.232321393 [I:onnxruntime:log, bfc_arena.cc:267 Reserve] Reserving memory in BFCArena for Cuda size: 33554432 2023-03-10 03:49:49.237067248 [I:onnxruntime:log, bfc_arena.cc:267 Reserve] Reserving memory in BFCArena for Cuda size: 33554432 2023-03-10 03:49:49.243079017 [I:onnxruntime:log, bfc_arena.cc:267 Reserve] Reserving memory in BFCArena for Cuda size: 33554432 2023-03-10 03:49:49.248245377 [I:onnxruntime:log, bfc_arena.cc:267 Reserve] Reserving memory in BFCArena for Cuda size: 33554432 2023-03-10 03:49:49.253628739 [I:onnxruntime:log, bfc_arena.cc:267 Reserve] Reserving memory in BFCArena for Cuda size: 33554432 2023-03-10 03:49:49.258519795 [I:onnxruntime:log, bfc_arena.cc:267 Reserve] Reserving memory in BFCArena for Cuda size: 33554432 I0310 03:49:49.263064 1 infer_response.cc:167] add response output: output: bbox_16, type: FP32, shape: [1,3200,4] I0310 03:49:49.265057 1 infer_response.cc:167] add response output: output: bbox_32, type: FP32, shape: [1,800,4] I0310 03:49:49.265146 1 infer_response.cc:167] add response output: output: bbox_8, type: FP32, shape: [1,12800,4] I0310 03:49:49.265207 1 infer_response.cc:167] add response output: output: kps_16, type: FP32, shape: [1,3200,10] I0310 03:49:49.265252 1 infer_response.cc:167] add response output: output: kps_32, type: FP32, shape: [1,800,10] I0310 03:49:49.265319 1 infer_response.cc:167] add response output: output: kps_8, type: FP32, shape: [1,12800,10] I0310 03:49:49.266020 1 infer_response.cc:167] add response output: output: score_16, type: FP32, shape: [1,3200,1] I0310 03:49:49.266107 1 infer_response.cc:167] add response output: output: score_32, type: FP32, shape: [1,800,1] I0310 03:49:49.266153 1 infer_response.cc:167] add response output: output: score_8, type: FP32, shape: [1,12800,1] I0310 03:49:49.669613 1 grpc_server.cc:4012] ModelInferHandler::InferResponseComplete, 0 step ISSUED I0310 03:49:49.669897 1 grpc_server.cc:3848] Process for ModelInferHandler, rpc_ok=1, 0 step COMPLETE I0310 03:49:49.670268 1 grpc_server.cc:2758] Done for ModelInferHandler, 0 I0310 03:49:49.669914 1 grpc_server.cc:3566] ModelInferHandler::InferRequestComplete I0310 03:49:49.670586 1 python_be.cc:1980] TRITONBACKEND_ModelInstanceExecute: model instance name fr_main_model released 1 requests I0310 03:50:03.995909 1 server.cc:334] Polling model repository I0310 03:50:19.118287 1 server.cc:334] Polling model repository I0310 03:50:34.259362 1 server.cc:334] Polling model repository I0310 03:50:49.414143 1 server.cc:334] Polling model repository I0310 03:51:04.543423 1 server.cc:334] Polling model repository I0310 03:51:19.688254 1 server.cc:334] Polling model repository I0310 03:51:34.827078 1 server.cc:334] Polling model repository I0310 03:51:49.962146 1 server.cc:334] Polling model repository

Expected behavior The model should pass back the results to python model.

Tabrizian commented 1 year ago

We have filed a ticket and @krishung5 is going to look into this issue.

BasharMahasen commented 1 year ago

@Tabrizian, @krishung5 , I can see there is a similar ticket #5471 and your response was to investigate this issue. Meanwhile, would you kindly advise a working Triton version which supports calling an ONNX model through BLS. In my case, I was able to call the ONNX model directly, however; it failed with cudaIPCHandler error when I called it from a Python backend.

BasharMahasen commented 1 year ago

Hello @Tabrizian, I really appreciate your support on this issue? We are using GRPC at the moment, but it is to slow and puts extra load on the cpu to copy / transfer the data.

Tabrizian commented 1 year ago

Hi @BasharMahasen, I looked at https://github.com/triton-inference-server/server/issues/5471 but I was not able to reproduce it. Could you please share your model repository and your client so that we can look into this issue? Also, please provide the CUDA driver version by sharing the output of nvidia-smi outside the container.

BasharMahasen commented 1 year ago

cudaIPC_Error_working_sample.zip Hello @Tabrizian, please find the attached zip file which contains the model repo, client, server log and nvidia-smi output (outside the container)

krishung5 commented 1 year ago

@BasharMahasen Thanks for providing the details for repro! I will look into this and update here once I have more information.

krishung5 commented 1 year ago

@BasharMahasen I was not able to reproduce the issue using the files you shared. Please find the server and client log attached (server.log.log, client.log.log). It worked for me with both 23.02 and 23.03 Triton containers. Could you check nvidia-smi on the container to confirm that the GPU is correctly exposed to the container? Besides, could you try setting the network to use host? I'm not sure if it helps with the case but I found some similar issues that were resolved after using the same host name .e.g. https://github.com/NVIDIA/nccl/issues/360#issuecomment-670650867

Thytu commented 1 year ago

Hello, we encountered the same issue but only while using the Driver Version: 470.182.03. Note that it works fine in 500+

krishung5 commented 1 year ago

Closing as unable to reproduce the issue. Please let us know if there is any update on the repro steps or if you'd like to follow-up.

Grople commented 1 year ago

hello, I also encountered the same issue, but using the container is tritonserver:23.08-py

kbegiedza commented 10 months ago

Same for me on 23.02-py using custom BLS with few parallel TensorRT inferences

asafberreby commented 6 months ago

Same for me with 22.12