triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.04k stars 1.44k forks source link

I don't know what to do. #7580

Open choi119 opened 2 weeks ago

choi119 commented 2 weeks ago

Description A clear and concise description of what the bug is.

output_image

Triton Information What version of Triton are you using?

triton 2.37.0

docker run --gpus=all -it --runtime=nvidia --rm --shm-size=2g --memory=16g -p 8000:8000 -p 8001:8001 -p 8002:8002 --name triton --net=host --privileged -v /tmp/.X11-unix:/tmp/.X11-unix -e DISPLAY=$DISPLAY -w /opt/nvidia/deepstream/deepstream-6.4 nvcr.io/nvidia/deepstream:6.4-triton-multiarch

root@arstest:/opt/nvidia/deepstream/deepstream-6.4/samples/configs/tao_pretrained_models/triton# /opt/tritonserver/bin/tritonserver --model-repository=/opt/nvidia/deepstream/deepstream/samples/configs/tao_pretrained_models/triton
I0829 03:48:40.853740 1161 libtorch.cc:2507] TRITONBACKEND_Initialize: pytorch
I0829 03:48:40.853768 1161 libtorch.cc:2517] Triton TRITONBACKEND API version: 1.15
I0829 03:48:40.853772 1161 libtorch.cc:2523] 'pytorch' TRITONBACKEND API version: 1.15
I0829 03:48:40.982268 1161 pinned_memory_manager.cc:241] Pinned memory pool is created at '0x759cca000000' with size 268435456
I0829 03:48:40.982554 1161 cuda_memory_manager.cc:107] CUDA memory pool is created on device 0 with size 67108864
I0829 03:48:40.984546 1161 model_lifecycle.cc:462] loading: vehiclemakenet:1
I0829 03:48:40.984603 1161 model_lifecycle.cc:462] loading: vehicletypenet:1
I0829 03:48:40.984672 1161 model_lifecycle.cc:462] loading: trafficcamnet:1
I0829 03:48:40.984731 1161 model_lifecycle.cc:462] loading: peopleNet:1
I0829 03:48:40.984773 1161 model_lifecycle.cc:462] loading: ssd:1
I0829 03:48:40.985036 1161 tensorrt.cc:65] TRITONBACKEND_Initialize: tensorrt
I0829 03:48:40.985046 1161 tensorrt.cc:75] Triton TRITONBACKEND API version: 1.15
I0829 03:48:40.985052 1161 tensorrt.cc:81] 'tensorrt' TRITONBACKEND API version: 1.15
I0829 03:48:40.985057 1161 tensorrt.cc:105] backend configuration:
{"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}}
I0829 03:48:40.985918 1161 tensorrt.cc:222] TRITONBACKEND_ModelInitialize: vehicletypenet (version 1)
I0829 03:48:40.986075 1161 tensorrt.cc:222] TRITONBACKEND_ModelInitialize: vehiclemakenet (version 1)
I0829 03:48:40.986087 1161 tensorrt.cc:222] TRITONBACKEND_ModelInitialize: trafficcamnet (version 1)
I0829 03:48:40.986096 1161 tensorrt.cc:222] TRITONBACKEND_ModelInitialize: peopleNet (version 1)
I0829 03:48:40.986712 1161 logging.cc:46] The logger passed into createInferRuntime differs from one already provided for an existing builder, runtime, or refitter. Uses of the global logger, returned by nvinfer1::getLogger(), will return the existing value.
I0829 03:48:40.986723 1161 logging.cc:46] The logger passed into createInferRuntime differs from one already provided for an existing builder, runtime, or refitter. Uses of the global logger, returned by nvinfer1::getLogger(), will return the existing value.
I0829 03:48:40.986738 1161 logging.cc:46] The logger passed into createInferRuntime differs from one already provided for an existing builder, runtime, or refitter. Uses of the global logger, returned by nvinfer1::getLogger(), will return the existing value.
I0829 03:48:40.991610 1161 logging.cc:46] Loaded engine size: 1 MiB
I0829 03:48:40.994543 1161 logging.cc:46] Loaded engine size: 4 MiB
I0829 03:48:40.995057 1161 logging.cc:46] Loaded engine size: 5 MiB
I0829 03:48:40.999939 1161 logging.cc:46] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +13, now: CPU 0, GPU 13 (MiB)
I0829 03:48:40.999939 1161 logging.cc:46] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +12, now: CPU 0, GPU 13 (MiB)
I0829 03:48:41.000927 1161 logging.cc:46] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +12, now: CPU 0, GPU 12 (MiB)
I0829 03:48:41.001890 1161 tensorrt.cc:288] TRITONBACKEND_ModelInstanceInitialize: vehiclemakenet_0_0 (GPU device 0)
I0829 03:48:41.002115 1161 logging.cc:46] The logger passed into createInferRuntime differs from one already provided for an existing builder, runtime, or refitter. Uses of the global logger, returned by nvinfer1::getLogger(), will return the existing value.
I0829 03:48:41.002119 1161 tensorrt.cc:288] TRITONBACKEND_ModelInstanceInitialize: vehicletypenet_0_0 (GPU device 0)
I0829 03:48:41.002383 1161 tensorrt.cc:288] TRITONBACKEND_ModelInstanceInitialize: trafficcamnet_0_0 (GPU device 0)
I0829 03:48:41.005719 1161 logging.cc:46] Loaded engine size: 4 MiB
I0829 03:48:41.008178 1161 logging.cc:46] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +4, now: CPU 0, GPU 4 (MiB)
I0829 03:48:41.008181 1161 logging.cc:46] Loaded engine size: 21 MiB
I0829 03:48:41.012650 1161 logging.cc:46] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +30, now: CPU 0, GPU 34 (MiB)
I0829 03:48:41.012942 1161 instance_state.cc:188] Created instance vehiclemakenet_0_0 on GPU 0 with stream priority 0 and optimization profile default[0];
I0829 03:48:41.013085 1161 model_lifecycle.cc:819] successfully loaded 'vehiclemakenet'
I0829 03:48:41.013139 1161 logging.cc:46] Loaded engine size: 5 MiB
I0829 03:48:41.013508 1161 tensorrt.cc:222] TRITONBACKEND_ModelInitialize: ssd (version 1)
I0829 03:48:41.013824 1161 logging.cc:46] The logger passed into createInferRuntime differs from one already provided for an existing builder, runtime, or refitter. Uses of the global logger, returned by nvinfer1::getLogger(), will return the existing value.
I0829 03:48:41.015849 1161 logging.cc:46] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +8, now: CPU 0, GPU 42 (MiB)
I0829 03:48:41.015881 1161 logging.cc:46] The logger passed into createInferRuntime differs from one already provided for an existing builder, runtime, or refitter. Uses of the global logger, returned by nvinfer1::getLogger(), will return the existing value.
I0829 03:48:41.015939 1161 logging.cc:46] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +38, now: CPU 0, GPU 42 (MiB)
I0829 03:48:41.016319 1161 logging.cc:46] Loaded engine size: 1 MiB
I0829 03:48:41.018076 1161 logging.cc:46] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +11, now: CPU 0, GPU 53 (MiB)
I0829 03:48:41.018442 1161 instance_state.cc:188] Created instance vehicletypenet_0_0 on GPU 0 with stream priority 0 and optimization profile default[0];
I0829 03:48:41.018596 1161 model_lifecycle.cc:819] successfully loaded 'vehicletypenet'
I0829 03:48:41.019716 1161 tensorrt.cc:288] TRITONBACKEND_ModelInstanceInitialize: peopleNet_0_0 (GPU device 0)
I0829 03:48:41.020570 1161 logging.cc:46] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU -9, now: CPU 0, GPU 33 (MiB)
I0829 03:48:41.020589 1161 logging.cc:46] The logger passed into createInferRuntime differs from one already provided for an existing builder, runtime, or refitter. Uses of the global logger, returned by nvinfer1::getLogger(), will return the existing value.
I0829 03:48:41.021179 1161 logging.cc:46] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +9, now: CPU 0, GPU 42 (MiB)
I0829 03:48:41.021380 1161 instance_state.cc:188] Created instance trafficcamnet_0_0 on GPU 0 with stream priority 0 and optimization profile default[0];
I0829 03:48:41.021484 1161 model_lifecycle.cc:819] successfully loaded 'trafficcamnet'
I0829 03:48:41.039169 1161 logging.cc:46] Loaded engine size: 21 MiB
I0829 03:48:41.044031 1161 logging.cc:46] Loaded engine size: 34 MiB
I0829 03:48:41.050569 1161 logging.cc:46] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +58, now: CPU 0, GPU 100 (MiB)
I0829 03:48:41.051723 1161 logging.cc:46] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +17, now: CPU 1, GPU 117 (MiB)
I0829 03:48:41.051993 1161 instance_state.cc:188] Created instance peopleNet_0_0 on GPU 0 with stream priority 0 and optimization profile default[0];
I0829 03:48:41.052203 1161 model_lifecycle.cc:819] successfully loaded 'peopleNet'
I0829 03:48:41.060684 1161 logging.cc:46] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +6, GPU +34, now: CPU 156, GPU 472 (MiB)
I0829 03:48:41.064121 1161 logging.cc:46] [MemUsageChange] Init cuDNN: CPU +2, GPU +10, now: CPU 158, GPU 482 (MiB)
I0829 03:48:41.065446 1161 logging.cc:46] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +1, GPU +56, now: CPU 1, GPU 119 (MiB)
I0829 03:48:41.069085 1161 tensorrt.cc:288] TRITONBACKEND_ModelInstanceInitialize: ssd_0_0 (GPU device 0)
I0829 03:48:41.069353 1161 logging.cc:46] The logger passed into createInferRuntime differs from one already provided for an existing builder, runtime, or refitter. Uses of the global logger, returned by nvinfer1::getLogger(), will return the existing value.
I0829 03:48:41.095817 1161 logging.cc:46] Loaded engine size: 34 MiB
I0829 03:48:41.102973 1161 logging.cc:46] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 158, GPU 474 (MiB)
I0829 03:48:41.104030 1161 logging.cc:46] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 158, GPU 482 (MiB)
I0829 03:48:41.105610 1161 logging.cc:46] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +36, now: CPU 1, GPU 119 (MiB)
I0829 03:48:41.108460 1161 logging.cc:46] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 89, GPU 476 (MiB)
I0829 03:48:41.109461 1161 logging.cc:46] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 89, GPU 484 (MiB)
I0829 03:48:41.110657 1161 logging.cc:46] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +88, now: CPU 1, GPU 207 (MiB)
I0829 03:48:41.110891 1161 instance_state.cc:188] Created instance ssd_0_0 on GPU 0 with stream priority 0 and optimization profile default[0];
I0829 03:48:41.111110 1161 model_lifecycle.cc:819] successfully loaded 'ssd'
I0829 03:48:41.111164 1161 server.cc:604] 
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0829 03:48:41.111286 1161 server.cc:631] 
+----------+-----------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+
| Backend  | Path                                                      | Config                                                                                                                          |
+----------+-----------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+
| pytorch  | /opt/tritonserver/backends/pytorch/libtriton_pytorch.so   | {}                                                                                                                              |
| tensorrt | /opt/tritonserver/backends/tensorrt/libtriton_tensorrt.so | {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000", |
|          |                                                           | "default-max-batch-size":"4"}}                                                                                                  |
+----------+-----------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+

I0829 03:48:41.111358 1161 server.cc:674] 
+----------------+---------+--------+
| Model          | Version | Status |
+----------------+---------+--------+
| peopleNet      | 1       | READY  |
| ssd            | 1       | READY  |
| trafficcamnet  | 1       | READY  |
| vehiclemakenet | 1       | READY  |
| vehicletypenet | 1       | READY  |
+----------------+---------+--------+

I0829 03:48:41.138442 1161 metrics.cc:810] Collecting metrics for GPU 0: NVIDIA GeForce GTX 1650
I0829 03:48:41.138637 1161 metrics.cc:703] Collecting CPU metrics
I0829 03:48:41.138762 1161 tritonserver.cc:2435] 
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option                           | Value                                                                                                                                                                |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton                                                                                                                                                               |
| server_version                   | 2.37.0                                                                                                                                                               |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tens |
|                                  | or_data parameters statistics trace logging                                                                                                                          |
| model_repository_path[0]         | /opt/nvidia/deepstream/deepstream/samples/configs/tao_pretrained_models/triton                                                                                       |
| model_control_mode               | MODE_NONE                                                                                                                                                            |
| strict_model_config              | 0                                                                                                                                                                    |
| rate_limit                       | OFF                                                                                                                                                                  |
| pinned_memory_pool_byte_size     | 268435456                                                                                                                                                            |
| cuda_memory_pool_byte_size{0}    | 67108864                                                                                                                                                             |
| min_supported_compute_capability | 6.0                                                                                                                                                                  |
| strict_readiness                 | 1                                                                                                                                                                    |
| exit_timeout                     | 30                                                                                                                                                                   |
| cache_enabled                    | 0                                                                                                                                                                    |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0829 03:48:41.170609 1161 grpc_server.cc:2451] Started GRPCInferenceService at 0.0.0.0:8001
I0829 03:48:41.171207 1161 http_server.cc:3558] Started HTTPService at 0.0.0.0:8000
I0829 03:48:41.212681 1161 http_server.cc:187] Started Metrics Service at 0.0.0.0:8002

Are you using the Triton container or did you build it yourself?

To Reproduce Steps to reproduce the behavior.

Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well).

Expected behavior A clear and concise description of what you expected to happen.

I would like to make an inference request to peopleNet on the Triton server and visualize the value. But I keep having problems. Please take a look at this code.

I don't know what to do.


# test1.py 
import tritonclient.grpc as grpcclient
import numpy as np
import cv2
from PIL import Image, ImageDraw

TRITON_URL = "localhost:8001"  
triton_client = grpcclient.InferenceServerClient(url=TRITON_URL)

def preprocess_image(image):
    print("Preprocessing image...")
    target_size = (960, 544) 
    image = image.resize(target_size, Image.BILINEAR)
    image_array = np.array(image).astype(np.float32)
    image_array *= 1.0 / 255  # Normalize (1/255)
    image_array = image_array.transpose(2, 0, 1)  # HWC to CHW
    image_array = image_array[np.newaxis, :]  # Add batch dimension
    expected_shape = (1, 3, 544, 960)
    if image_array.shape != expected_shape:
        raise ValueError(f"Unexpected image shape: {image_array.shape}. Expected {expected_shape}.")
    return image_array

def infer_with_triton(image):
    print("Starting inference with Triton...")
    image_array = preprocess_image(image)
    inputs = [grpcclient.InferInput("input_1:0", image_array.shape, "FP32")]
    inputs[0].set_data_from_numpy(image_array)
    outputs = [
        grpcclient.InferRequestedOutput("output_bbox/BiasAdd:0"),
        grpcclient.InferRequestedOutput("output_cov/Sigmoid:0")
    ]
    try:
        response = triton_client.infer(model_name="peopleNet", model_version="1", inputs=inputs, outputs=outputs)
        bbox_output = response.as_numpy("output_bbox/BiasAdd:0")
        class_output = response.as_numpy("output_cov/Sigmoid:0")
        print("Inference completed successfully.")
        return bbox_output, class_output
    except grpcclient.InferenceServerException as e:
        print(f"An error occurred: {e}")
        raise

def postprocess_results(bbox_output, class_output, iou_threshold=0.4):
    print("Postprocessing results...")
    try:
        num_boxes = bbox_output.shape[1]
        height, width = bbox_output.shape[2], bbox_output.shape[3]
        boxes = []
        scores = []
        for i in range(num_boxes):
            bbox = bbox_output[0, i, :, :].flatten()
            class_probs = class_output[0, :, :, :].flatten()
            class_id = np.argmax(class_probs)
            confidence = class_probs[class_id]
            x_min, y_min = bbox[0] * width, bbox[1] * height
            x_max, y_max = bbox[2] * width, bbox[3] * height
            if confidence > 0.2:  # NMS의 confidence_threshold와 일치
                boxes.append([int(x_min), int(y_min), int(x_max - x_min), int(y_max - y_min)])
                scores.append(float(confidence))

        if len(boxes) == 0:
            print("No boxes found with confidence above the threshold.")
            return []

        indices = cv2.dnn.NMSBoxes(boxes, scores, score_threshold=0.2, nms_threshold=iou_threshold)
        if len(indices) == 0:
            print("No boxes remain after Non-Maximum Suppression.")
            return []

        indices = indices.flatten() if indices.ndim > 1 else indices

        selected_boxes = [boxes[i] for i in indices]

        return [{"class_id": np.argmax(class_output[0, :, :, :].flatten()), "confidence": scores[i], "bbox": selected_boxes[i]} for i in range(len(selected_boxes))]

    except Exception as e:
        print(f"An error occurred during postprocessing: {e}")
        raise

def visualize_output(image_path, boxes):
    print("Visualizing output...")
    image = Image.open(image_path).convert("RGB")
    draw = ImageDraw.Draw(image)
    for box in boxes:
        x_min, y_min, x_max, y_max = box["bbox"]
        class_id = box["class_id"]
        confidence = box["confidence"]
        draw.rectangle([x_min, y_min, x_max, y_max], outline="red", width=2)
        draw.text((x_min, y_min), f"Class {class_id} ({confidence:.2f})", fill="red")
    output_image_path = '/home/arstest/peoplesample_output.jpg'
    image.save(output_image_path)
    print(f"Output image saved to {output_image_path}")
    print("Displaying the output image...")
    image.show()

def main():
    print("Starting the application...")
    image_path = '/home/arstest/peoplesample.jpg'
    print("Opening image...")
    image = Image.open(image_path).convert("RGB")
    try:
        print("Performing inference...")
        bbox_output, class_output = infer_with_triton(image)
        boxes = postprocess_results(bbox_output, class_output)
        visualize_output(image_path, boxes)
    except Exception as e:
        print(f"An error occurred: {e}")

if __name__ == "__main__":
    main()

choi119 commented 2 weeks ago

I'm sorry, I don't know what to do.

KrishnanPrash commented 2 weeks ago

Hello @choi119 ,

Could you provide more information about what is not working? How you are you downloading the models? What version are you using? Could you provide the output of your code snippet and any other relevant logs.

choi119 commented 1 week ago

@KrishnanPrash Hello I made a peopleNet inference request to a Triton server outside. However, the output value is not correct. So I visualized it. But the bounding box was pointing to a strange place. You used the shared test1.py . What should I modify?

It works normally when deduced using the deepstream app both inside and outside the server.

But if you don't use the deepstream app, the output is weird.

choi119 commented 1 week ago

@KrishnanPrash

I used this.

arstest@arstest:~$ cat deepstream.sh docker run --gpus=all -it --runtime=nvidia --rm --shm-size=2g --memory=16g -p 8000:8000 -p 8001:8001 -p 8002:8002 --name triton --net=host --privileged -v /tmp/.X11-unix:/tmp/.X11-unix -e DISPLAY=$DISPLAY -w /opt/nvidia/deepstream/deepstream-6.4 nvcr.io/nvidia/deepstream:6.4-triton-multiarch

arstest@arstest:~$ docker images REPOSITORY TAG IMAGE ID CREATED SIZE nvcr.io/nvidia/deepstream 6.4-triton-multiarch a3af5eff6a88 9 months ago 16.2GB

choi119 commented 1 week ago

I used this.

/opt/nvidia/deepstream/deepstream-6.4/samples/configs/tao_pretrained_models/prepare_triton_models.sh

choi119 commented 1 week ago

@KrishnanPrash I used this.

/opt/tritonserver/bin/tritonserver --model-repository=/opt/nvidia/deepstream/deepstream/samples/configs/tao_pretrained_models/triton