Conflicting error messages for batching mode in python backend

When inputting a single input array with

inputs = [ httpclient.InferInput("input_branch1",
                                          self.model_input_shape,
                                          "FP32") ]
outputs = [  httpclient.InferRequestedOutput("Target1") ]

inputs[0].set_data_from_numpy(input_data[0].astype(np.single))

I get the following error:

tritonclient.utils.InferenceServerException: [400] [request id: <id_unknown>] unexpected shape for input 'input_branch1' for model 'tglauch_classifier'. Expected [-1,10,10,60,16], got [10,10,60,16]. NOTE: Setting a non-zero max_batch_size in the model config requires a batch dimension to be prepended to each input shape. If you want to specify the full shape including the batch dim in your input dims config, try setting max_batch_size to zero. See the model configuration docs for more info on max_batch_size.

When passing a batched input array:

inputs = [ httpclient.InferInput("input_branch1",
                                          self.model_input_shape,
                                          "FP32") ]
outputs = [  httpclient.InferRequestedOutput("Target1") ]

inputs[0].set_data_from_numpy(input_data.astype(np.single))

I get this error:

tritonclient.utils.InferenceServerException: got unexpected numpy array shape [48, 10, 10, 60, 16], expected [10, 10, 60, 16]

The config.pbtxt is:

name: "name"
platform: "onnxruntime_onnx"
max_batch_size: 64

and the autogenerated config.pbtxt is { "name": "name", "platform": "onnxruntime_onnx", "backend": "onnxruntime", "version_policy": { "latest": { "num_versions": 1 } }, "max_batch_size": 64, "input": [ { "name": "input_branch1", "data_type": "TYPE_FP32", "dims": [ 10, 10, 60, 16 ] } ], "output": [ { "name": "Target1", "data_type": "TYPE_FP32", "dims": [ 5 ] } ], "batch_input": [], "batch_output": [], "optimization": { "priority": "PRIORITY_DEFAULT", "input_pinned_memory": { "enable": true }, "output_pinned_memory": { "enable": true }, "gather_kernel_buffer_threshold": 0, "eager_batching": false }, "instance_group": [ { "name": "tglauch_classifier", "kind": "KIND_GPU", "count": 1, "gpus": [ 0 ], "secondary_devices": [], "profile": [], "passive": false, "host_policy": "" } ], "default_model_filename": "model.onnx", "cc_model_filenames": {}, "metric_tags": {}, "parameters": {}, "model_warmup": [], "dynamic_batching": {} }

when looking at config Model Config: {'name': 'name', 'platform': 'onnxruntime_onnx', 'backend': 'onnxruntime', 'version_policy': {'latest': {'num_versions': 1}}, 'max_batch_size': 64, 'input': [{'name': 'input_branch1', 'data_type': 'TYPE_FP32', 'format': 'FORMAT_NONE', 'dims': [10, 10, 60, 16], 'is_shape_tensor': False, 'allow_ragged_batch': False, 'optional': False}], 'output': [{'name': 'Target1', 'data_type': 'TYPE_FP32', 'dims': [5], 'label_filename': '', 'is_shape_tensor': False}], 'batch_input': [], 'batch_output': [], 'optimization': {'priority': 'PRIORITY_DEFAULT', 'input_pinned_memory': {'enable': True}, 'output_pinned_memory': {'enable': True}, 'gather_kernel_buffer_threshold': 0, 'eager_batching': False}, 'dynamic_batching': {'preferred_batch_size': [64], 'max_queue_delay_microseconds': 0, 'preserve_ordering': False, 'priority_levels': 0, 'default_priority_level': 0, 'priority_queue_policy': {}}, 'instance_group': [{'name': 'tglauch_classifier', 'kind': 'KIND_GPU', 'count': 1, 'gpus': [0], 'secondary_devices': [], 'profile': [], 'passive': False, 'host_policy': ''}], 'default_model_filename': 'model.onnx', 'cc_model_filenames': {}, 'metric_tags': {}, 'parameters': {}, 'model_warmup': []}

and the metadata Model Metadata: {'name': 'name', 'versions': ['1'], 'platform': 'onnxruntime_onnx', 'inputs': [{'name': 'input_branch1', 'datatype': 'FP32', 'shape': [-1, 10, 10, 60, 16]}], 'outputs': [{'name': 'Target1', 'datatype': 'FP32', 'shape': [-1, 5]}]} (i3.py:518 in _configure_model)

Changing max_batch_size to 0 throws a different error:

tritonclient.utils.InferenceServerException: [400] [request id: <id_unknown>] inference request batch-size must be <= 4 for 'name'

I did query the config via REST and max_batch_size was changed to 4

Model Config: {'name': 'name', 'platform': 'onnxruntime_onnx', 'backend': 'onnxruntime', 'version_policy': {'latest': {'num_versions': 1}}, 'max_batch_size': 4, 'input': [{'name': 'input_branch1', 'data_type': 'TYPE_FP32', 'format': 'FORMAT_NONE', 'dims': [10, 10, 60, 16], 'is_shape_tensor': False, 'allow_ragged_batch': False, 'optional': False}], 'output': [{'name': 'Target1', 'data_type': 'TYPE_FP32', 'dims': [5], 'label_filename': '', 'is_shape_tensor': False}], 'batch_input': [], 'batch_output': [], 'optimization': {'priority': 'PRIORITY_DEFAULT', 'input_pinned_memory': {'enable': True}, 'output_pinned_memory': {'enable': True}, 'gather_kernel_buffer_threshold': 0, 'eager_batching': False}, 'dynamic_batching': {'preferred_batch_size': [4], 'max_queue_delay_microseconds': 0, 'preserve_ordering': False, 'priority_levels': 0, 'default_priority_level': 0, 'priority_queue_policy': {}}, 'instance_group': [{'name': 'name', 'kind': 'KIND_GPU', 'count': 1, 'gpus': [0], 'secondary_devices': [], 'profile': [], 'passive': False, 'host_policy': ''}], 'default_model_filename': 'model.onnx', 'cc_model_filenames': {}, 'metric_tags': {}, 'parameters': {}, 'model_warmup': []}

the config autocomplete changes the parameter max_batch_size to 4 from 0 in my config.pbtxt

I did notice that W0125 17:09:56.807240 1 onnxruntime.cc:813] autofilled max_batch_size to 4 for model 'tglauch_classifier' since batching is supporrted but no max_batch_size is specified in model configuration. Must specify max_batch_size to utilize autofill with a larger max batch size where the config was

name: "name"
platform: "onnxruntime_onnx"
max_batch_size: 0

triton-inference-server / server

Conflicting error messages for batching mode in python backend #6832