triton-inference-server / dali_backend

The Triton backend that allows running GPU-accelerated data pre-processing pipelines implemented in DALI's python API.
https://docs.nvidia.com/deeplearning/dali/user-guide/docs/index.html
MIT License
123 stars 29 forks source link

Triton server crash on hitting inference endpoint #258

Open vaibhavjainwiz opened 1 week ago

vaibhavjainwiz commented 1 week ago

Triton Inference server restart everytime I hit the /infer endpoint. I am usin Kserve to deploy model on K8s.

Input :

curl --location 'https://<url>/v2/models/dali/infer' \ --header 'Content-Type: application/json' \ --data '{ "inputs": [ { "name": "DALI_INPUT_0", "shape": [ 1699 ], "datatype": "UINT8", "data": [ 255, 216, 255, 224, 0, 16, 74, 70, 73, 70, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 255, 226, 1, 216, 73, 67, 67, 95, 80, 82, 79, 70, 73, 76, 69, 0, 1, 1, 0, 0, 1, 200, 0, 0, 0, 0, 4, 48, 0, 0, 109, 110, 116, 114, 82, 71, 66, 32, 88, 89, 90, 32, 7, 224, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 97, 99, 115, 112, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 246, 214, 0, 1, 0, 0, 0, 0, 211, 45, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 9, 100, 101, 115, 99, 0, 0, 0, 240, 0, 0, 0, 36, 114, 88, 89, 90, 0, 0, 1, 20, 0, 0, 0, 20, 103, 88, 89, 90, 0, 0, 1, 40, 0, 0, 0, 20, 98, 88, 89, 90, 0, 0, 1, 60, 0, 0, 0, 20, 119, 116, 112, 116, 0, 0, 1, 80, 0, 0, 0, 20, 114, 84, 82, 67, 0, 0, 1, 100, 0, 0, 0, 40, 103, 84, 82, 67, 0, 0, 1, 100, 0, 0, 0, 40, 98, 84, 82, 67, 0, 0, 1, 100, 0, 0, 0, 40, 99, 112, 114, 116, 0, 0, 1, 140, 0, 0, 0, 60, 109, 108, 117, 99, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 12, 101, 110, 85, 83, 0, 0, 0, 8, 0, 0, 0, 28, 0, 115, 0, 82, 0, 71, 0, 66, 88, 89, 90, 32, 0, 0, 0, 0, 0, 0, 111, 162, 0, 0, 56, 245, 0, 0, 3, 144, 88, 89, 90, 32, 0, 0, 0, 0, 0, 0, 98, 153, 0, 0, 183, 133, 0, 0, 24, 218, 88, 89, 90, 32, 0, 0, 0, 0, 0, 0, 36, 160, 0, 0, 15, 132, 0, 0, 182, 207, 88, 89, 90, 32, 0, 0, 0, 0, 0, 0, 246, 214, 0, 1, 0, 0, 0, 0, 211, 45, 112, 97, 114, 97, 0, 0, 0, 0, 0, 4, 0, 0, 0, 2, 102, 102, 0, 0, 242, 167, 0, 0, 13, 89, 0, 0, 19, 208, 0, 0, 10, 91, 0, 0, 0, 0, 0, 0, 0, 0, 109, 108, 117, 99, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 12, 101, 110, 85, 83, 0, 0, 0, 32, 0, 0, 0, 28, 0, 71, 0, 111, 0, 111, 0, 103, 0, 108, 0, 101, 0, 32, 0, 73, 0, 110, 0, 99, 0, 46, 0, 32, 0, 50, 0, 48, 0, 49, 0, 54, 255, 219, 0, 67, 0, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 219, 0, 67, 1, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 192, 0, 17, 8, 0, 214, 0, 236, 3, 1, 34, 0, 2, 17, 1, 3, 17, 1, 255, 196, 0, 23, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 255, 196, 0, 39, 16, 1, 1, 1, 0, 1, 2, 5, 4, 3, 1, 1, 0, 0, 0, 0, 0, 0, 1, 17, 33, 49, 81, 2, 18, 65, 97, 240, 113, 129, 145, 177, 161, 209, 225, 193, 241, 255, 196, 0, 21, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 255, 196, 0, 20, 17, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 255, 218, 0, 12, 3, 1, 0, 2, 17, 3, 17, 0, 63, 0, 136, 168, 0, 101, 92, 160, 138, 101, 48, 1, 115, 58, 241, 251, 103, 96, 42, 179, 230, 250, 38, 208, 108, 99, 111, 122, 104, 54, 48, 3, 99, 26, 186, 13, 33, 166, 128, 47, 11, 192, 50, 47, 6, 125, 193, 17, 64, 64, 5, 13, 64, 70, 181, 53, 0, 86, 227, 42, 43, 106, 198, 174, 136, 169, 178, 51, 118, 160, 22, 109, 217, 207, 237, 156, 106, 55, 192, 56, 142, 185, 62, 72, 153, 62, 111, 246, 14, 99, 121, 19, 39, 127, 159, 128, 100, 107, 39, 115, 39, 112, 64, 227, 184, 2, 160, 10, 7, 32, 44, 185, 253, 50, 3, 83, 149, 73, 209, 81, 81, 26, 1, 129, 172, 76, 84, 65, 64, 105, 89, 211, 81, 90, 19, 77, 5, 42, 106, 1, 27, 98, 54, 168, 140, 180, 153, 111, 32, 202, 53, 229, 167, 148, 25, 26, 242, 251, 158, 95, 127, 224, 25, 23, 12, 4, 84, 192, 20, 50, 153, 64, 69, 64, 15, 186, 128, 178, 247, 93, 97, 65, 68, 68, 85, 64, 84, 0, 0, 0, 22, 34, 192, 88, 211, 49, 160, 42, 91, 101, 146, 127, 234, 180, 2, 42, 2, 111, 207, 182, 155, 252, 159, 63, 140, 78, 62, 117, 5, 212, 211, 100, 77, 128, 186, 168, 160, 51, 183, 113, 165, 69, 115, 69, 168, 168, 42, 0, 0, 10, 138, 128, 0, 0, 0, 0, 2, 192, 128, 173, 50, 208, 13, 185, 182, 2, 43, 55, 248, 128, 127, 17, 157, 237, 18, 221, 73, 160, 187, 126, 72, 125, 191, 226, 85, 244, 160, 125, 63, 13, 75, 172, 47, 191, 172, 6, 213, 34, 162, 185, 223, 84, 90, 138, 128, 214, 24, 12, 128, 10, 141, 78, 139, 130, 176, 55, 137, 130, 53, 229, 135, 150, 42, 160, 207, 150, 47, 150, 40, 42, 100, 49, 80, 24, 84, 85, 65, 191, 72, 195, 115, 164, 250, 2, 86, 47, 73, 239, 203, 126, 46, 149, 206, 243, 103, 210, 1, 47, 114, 251, 25, 103, 43, 61, 61, 253, 125, 251, 118, 128, 153, 126, 83, 203, 90, 253, 231, 29, 247, 250, 250, 150, 241, 123, 231, 40, 51, 48, 151, 148, 51, 20, 111, 195, 235, 59, 86, 153, 157, 111, 217, 164, 87, 52, 84, 84, 116, 157, 4, 84, 87, 49, 81, 81, 185, 208, 39, 65, 20, 0, 26, 0, 0, 0, 0, 24, 84, 85, 65, 169, 210, 50, 212, 232, 5, 233, 92, 239, 165, 246, 253, 58, 49, 103, 167, 222, 127, 64, 158, 107, 211, 78, 139, 47, 135, 213, 60, 87, 111, 0, 187, 61, 254, 210, 67, 102, 100, 137, 101, 157, 125, 83, 211, 64, 206, 53, 103, 54, 36, 189, 86, 113, 61, 239, 79, 167, 112, 106, 122, 222, 245, 111, 74, 78, 15, 23, 68, 86, 17, 81, 81, 208, 189, 4, 168, 172, 163, 117, 133, 70, 224, 8, 160, 32, 52, 168, 170, 128, 0, 0, 131, 0, 40, 55, 56, 156, 176, 185, 178, 115, 152, 13, 37, 154, 112, 3, 23, 241, 127, 100, 201, 121, 149, 171, 202, 101, 244, 191, 144, 75, 118, 158, 153, 12, 189, 162, 243, 223, 62, 128, 153, 157, 127, 31, 219, 82, 122, 222, 169, 38, 40, 42, 91, 58, 46, 198, 120, 230, 162, 162, 4, 84, 116, 61, 69, 130, 167, 137, 136, 215, 137, 32, 141, 9, 201, 202, 40, 28, 128, 162, 128, 156, 156, 170, 130, 10, 3, 152, 81, 80, 51, 231, 229, 103, 88, 80, 103, 231, 232, 34, 241, 220, 19, 162, 234, 31, 168, 11, 169, 162, 2, 234, 40, 8, 190, 91, 223, 246, 53, 230, 153, 254, 127, 160, 205, 224, 133, 230, 147, 168, 58, 39, 70, 186, 57, 248, 188, 91, 196, 4, 188, 183, 38, 70, 124, 49, 180, 4, 84, 20, 69, 1, 68, 216, 108, 84, 81, 55, 218, 156, 246, 69, 85, 103, 105, 200, 51, 66, 138, 139, 58, 194, 164, 185, 77, 4, 158, 191, 61, 3, 160, 4, 59, 253, 149, 59, 130, 0, 13, 79, 95, 163, 45, 32, 13, 100, 206, 159, 63, 44, 155, 126, 96, 23, 173, 17, 96, 38, 219, 214, 172, 154, 212, 146, 40, 40, 136, 138, 162, 0, 168, 32, 52, 2, 130, 234, 8, 46, 154, 138, 12, 80, 162, 162, 44, 244, 69, 128, 120, 186, 172, 147, 25, 187, 243, 253, 38, 207, 144, 11, 197, 16, 6, 178, 38, 46, 254, 153, 239, 238, 11, 122, 247, 232, 221, 207, 245, 206, 242, 0, 0, 11, 25, 80, 104, 77, 52, 20, 64, 20, 79, 194, 128, 6, 10, 208, 125, 142, 80, 3, 41, 158, 224, 11, 147, 220, 200, 12, 81, 108, 69, 68, 89, 213, 0, 106, 244, 162, 105, 160, 78, 135, 116, 0, 166, 116, 231, 175, 177, 83, 122, 123, 2, 231, 227, 23, 61, 183, 163, 59, 122, 27, 238, 131, 92, 51, 122, 136, 160, 10, 2, 162, 128, 0, 40, 0, 0, 13, 128, 138, 40, 0, 0, 35, 56, 210, 3, 3, 105, 145, 70, 81, 172, 48, 70, 69, 202, 101, 4, 12, 166, 80, 17, 113, 112, 25, 26, 192, 16, 80, 0, 80, 65, 64, 69, 0, 0, 21, 176, 16, 0, 0, 0, 68, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 20, 0, 64, 0, 31, 255, 217 ] } ], "outputs": [ { "name": "DALI_OUTPUT_0" } ] }'

config.pbtxt

name: "dali"
backend: "dali"
max_batch_size: 0
input [
{
   name: "DALI_INPUT_0"
   data_type: TYPE_UINT8
   dims: [ -1 ]
}
]

output [
{
   name: "DALI_OUTPUT_0"
   data_type: TYPE_UINT8
   dims: [ 224, 224, 3 ]
}
]

instance_group [
{
  kind: KIND_GPU
}
]

dali.py

import nvidia.dali as dali
from nvidia.dali.plugin.triton import autoserialize

@autoserialize 
@dali.pipeline_def(batch_size=256, num_threads=4, device_id=0) 
def pipe():
    images = dali.fn.external_source(device="cpu", name="DALI_INPUT_0")
    images = dali.fn.decoders.image(images, device="mixed")
    images = dali.fn.resize(images, resize_x=224, resize_y=224)
    return images   
szalpal commented 1 week ago

Hi @vaibhavjainwiz ,

let me help with your problem. I'd like to understand what happens first, so could you clarify? The DALI Pipeline code you've sent tells in the comment that the decoding and resize happens on the CPU, however, the code tells that it actually happens on GPU:

    images = dali.fn.decoders.image(images, device="mixed")  # Decode on CPU
    images = dali.fn.resize(images, resize_x=224, resize_y=224)  # Resize on CPU

The device='mixed' will perform the decoding on the GPU and the resize will infer the GPU device from previous operation. What precisely was you intention here?

vaibhavjainwiz commented 1 week ago

Hi @vaibhavjainwiz ,

let me help with your problem. I'd like to understand what happens first, so could you clarify? The DALI Pipeline code you've sent tells in the comment that the decoding and resize happens on the CPU, however, the code tells that it actually happens on GPU:

    images = dali.fn.decoders.image(images, device="mixed")  # Decode on CPU
    images = dali.fn.resize(images, resize_x=224, resize_y=224)  # Resize on CPU

The device='mixed' will perform the decoding on the GPU and the resize will infer the GPU device from previous operation. What precisely was you intention here?

Sorry for confusion, I was trying out this pipeline on both CPU and GPU. These comments are left over, please ignore them. I am removing these comments from issue description to avoid more confusion.

szalpal commented 1 week ago

Thank you for the clarification. I've run your model with the sample provided as a stand-alone DALI pipeline and everything worked fine. Although I didn't plug it in the K8s nor Triton. Do you happen to have any Triton stack trace which might help with narrowing down the issue?

Also, something that comes to my mind when looking at the configuration, you're setting max_batch_size=0, while in the DALI pipeline you're setting batch_size=256. The max_batch_size=0 option in Triton is generally used for models that do not support batching. Could you check if setting these two params to the same value (e.g. max_batch_size=256) helps?