triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.28k stars 1.47k forks source link

Issue while loading the model using TIS (Triton Inference Server) : For the model to support batching the shape should have at least 1 dimension and the first dimension must be -1 #7462

Open Vaishnvi opened 3 months ago

Vaishnvi commented 3 months ago

I have exported code using following format:

          torch.onnx.export(crnn.module,         # use .module to unwrap the model
                    example_input,       # model input (or a tuple for multiple inputs)
                    "bhaasha_model.onnx",        # where to save the model (can be a file or file-like object)
                    export_params=True,  # store the trained parameter weights inside the model file
                    verbose=True,
                    opset_version=17,    # the ONNX version to export the model to
                    do_constant_folding=True,  # whether to execute constant folding for optimization
                    input_names = ['input'],   # the model's input names
                    output_names = ['onnx::Shape_261', 'input.79', 'inp'], # the model's output names
                    dynamic_axes={'input' : {0 : 'batch_size'} ,    # variable length axes
                                    'onnx::Shape_261': {0: 'batch_size'},
                                    'input.79': {0: 'batch_size'},
                                    'inp': {0: 'batch_size'}
                                    })

The exported model is working fine using onnxruntime.

I am facing issue while loading the model using TIS (Triton Inference Server).

I get this error:

failed to load 'bhaasha_ocr' version 1: Invalid argument: model 'bhaasha_ocr', tensor 'inp': for the model to support batching the shape should have at least 1 dimension and the first dimension must be -1; but shape expected by the model is [64,32,246]

This is very unusual, cause for other two outputs I am not facing this error, only for the third output I am getting this error. How is this possible?

Config.pbtxt :

name: "bhaasha_ocr" backend: "onnxruntime" max_batch_size: 64

input [ { name: "input" data_type: TYPE_FP32 dims: [1, 96, 256]
} ]

output [ { name: "onnx::Shape_261" data_type: TYPE_FP32 dims: [20, 2]
}, { name: "input.79" data_type: TYPE_FP32 dims: [1, 96, 256]
}, { name: "inp" data_type: TYPE_FP32 dims: [32, 246] } ]

dynamic_batching { preferred_batch_size: [2, 4, 8, 16, 32, 64 ] }

How can I resolve this?

Vaishnvi commented 3 months ago

I even tried with max_batch_size = 0, it still gives this error: failed to load 'bhaasha_ocr' version 1: Invalid argument: model 'bhaasha_ocr', tensor 'inp': the model expects 3 dimensions (shape [64,32,246]) but the model configuration specifies 3 dimensions (shape [-1,32,246])

name: "bhaasha_ocr" backend: "onnxruntime" max_batch_size: 0

input [ { name: "input" data_type: TYPE_FP32 dims: [-1, 1, 96, 256] } ]

output [ { name: "onnx::Shape_261" data_type: TYPE_FP32 dims: [-1, 20, 2] }, { name: "input.79" data_type: TYPE_FP32 dims: [-1, 1, 96, 256] }, { name: "inp" data_type: TYPE_FP32 dims: [-1, 32, 246] } ]

rmccorm4 commented 3 months ago

Hi @Vaishnvi, thanks for sharing such detailed info. Since this is an ONNX model, and the ORT backend supports full config auto-complete, can you try to load the model without any config.pbtxt? This should generate the I/O for the config directly from the model metadata and better help us understand what might be going wrong.