Open briedel opened 8 months ago
Hello can you explain what kind of model you are using? Some models do not support dynamic batching.
It looks like in your config.pbtxt
you put in dims: [10,10,60,16]
. However, you should be setting dims:[-1,10,10,60,16]
in config.pbtxt because you are using variable input dimension sizes. Since your model supports the 4D vector, the auto-complete assumes that will be the input type. It does not know that you will have already batched inputs.
The problem is in the model configuration sizes, and the error prompt is complaining that changing max_batch_size
does not solve the problem.
This is only tangent to dynamic batching on the server side, which additionally batches inputs together to form larger inputs for the machine to process at a time. See here for details on dynamic batching. You are trying to use dynamic batching on your model as well. As noted in your first error prompt, you are setting batching by specifying maximum batch size.
cc: @tanmayv25 @nv-kmcgill53 (for auto-complete)
When inputting a single input array with
I get the following error:
tritonclient.utils.InferenceServerException: [400] [request id: <id_unknown>] unexpected shape for input 'input_branch1' for model 'tglauch_classifier'. Expected [-1,10,10,60,16], got [10,10,60,16]. NOTE: Setting a non-zero max_batch_size in the model config requires a batch dimension to be prepended to each input shape. If you want to specify the full shape including the batch dim in your input dims config, try setting max_batch_size to zero. See the model configuration docs for more info on max_batch_size.
When passing a batched input array:
I get this error:
tritonclient.utils.InferenceServerException: got unexpected numpy array shape [48, 10, 10, 60, 16], expected [10, 10, 60, 16]
The
config.pbtxt
is:and the autogenerated config.pbtxt is
{ "name": "name", "platform": "onnxruntime_onnx", "backend": "onnxruntime", "version_policy": { "latest": { "num_versions": 1 } }, "max_batch_size": 64, "input": [ { "name": "input_branch1", "data_type": "TYPE_FP32", "dims": [ 10, 10, 60, 16 ] } ], "output": [ { "name": "Target1", "data_type": "TYPE_FP32", "dims": [ 5 ] } ], "batch_input": [], "batch_output": [], "optimization": { "priority": "PRIORITY_DEFAULT", "input_pinned_memory": { "enable": true }, "output_pinned_memory": { "enable": true }, "gather_kernel_buffer_threshold": 0, "eager_batching": false }, "instance_group": [ { "name": "tglauch_classifier", "kind": "KIND_GPU", "count": 1, "gpus": [ 0 ], "secondary_devices": [], "profile": [], "passive": false, "host_policy": "" } ], "default_model_filename": "model.onnx", "cc_model_filenames": {}, "metric_tags": {}, "parameters": {}, "model_warmup": [], "dynamic_batching": {} }
when looking at config
Model Config: {'name': 'name', 'platform': 'onnxruntime_onnx', 'backend': 'onnxruntime', 'version_policy': {'latest': {'num_versions': 1}}, 'max_batch_size': 64, 'input': [{'name': 'input_branch1', 'data_type': 'TYPE_FP32', 'format': 'FORMAT_NONE', 'dims': [10, 10, 60, 16], 'is_shape_tensor': False, 'allow_ragged_batch': False, 'optional': False}], 'output': [{'name': 'Target1', 'data_type': 'TYPE_FP32', 'dims': [5], 'label_filename': '', 'is_shape_tensor': False}], 'batch_input': [], 'batch_output': [], 'optimization': {'priority': 'PRIORITY_DEFAULT', 'input_pinned_memory': {'enable': True}, 'output_pinned_memory': {'enable': True}, 'gather_kernel_buffer_threshold': 0, 'eager_batching': False}, 'dynamic_batching': {'preferred_batch_size': [64], 'max_queue_delay_microseconds': 0, 'preserve_ordering': False, 'priority_levels': 0, 'default_priority_level': 0, 'priority_queue_policy': {}}, 'instance_group': [{'name': 'tglauch_classifier', 'kind': 'KIND_GPU', 'count': 1, 'gpus': [0], 'secondary_devices': [], 'profile': [], 'passive': False, 'host_policy': ''}], 'default_model_filename': 'model.onnx', 'cc_model_filenames': {}, 'metric_tags': {}, 'parameters': {}, 'model_warmup': []}
and the metadata
Model Metadata: {'name': 'name', 'versions': ['1'], 'platform': 'onnxruntime_onnx', 'inputs': [{'name': 'input_branch1', 'datatype': 'FP32', 'shape': [-1, 10, 10, 60, 16]}], 'outputs': [{'name': 'Target1', 'datatype': 'FP32', 'shape': [-1, 5]}]} (i3.py:518 in _configure_model)
Changing
max_batch_size
to0
throws a different error:tritonclient.utils.InferenceServerException: [400] [request id: <id_unknown>] inference request batch-size must be <= 4 for 'name'
I did query the config via REST and
max_batch_size
was changed to 4Model Config: {'name': 'name', 'platform': 'onnxruntime_onnx', 'backend': 'onnxruntime', 'version_policy': {'latest': {'num_versions': 1}}, 'max_batch_size': 4, 'input': [{'name': 'input_branch1', 'data_type': 'TYPE_FP32', 'format': 'FORMAT_NONE', 'dims': [10, 10, 60, 16], 'is_shape_tensor': False, 'allow_ragged_batch': False, 'optional': False}], 'output': [{'name': 'Target1', 'data_type': 'TYPE_FP32', 'dims': [5], 'label_filename': '', 'is_shape_tensor': False}], 'batch_input': [], 'batch_output': [], 'optimization': {'priority': 'PRIORITY_DEFAULT', 'input_pinned_memory': {'enable': True}, 'output_pinned_memory': {'enable': True}, 'gather_kernel_buffer_threshold': 0, 'eager_batching': False}, 'dynamic_batching': {'preferred_batch_size': [4], 'max_queue_delay_microseconds': 0, 'preserve_ordering': False, 'priority_levels': 0, 'default_priority_level': 0, 'priority_queue_policy': {}}, 'instance_group': [{'name': 'name', 'kind': 'KIND_GPU', 'count': 1, 'gpus': [0], 'secondary_devices': [], 'profile': [], 'passive': False, 'host_policy': ''}], 'default_model_filename': 'model.onnx', 'cc_model_filenames': {}, 'metric_tags': {}, 'parameters': {}, 'model_warmup': []}
the config autocomplete changes the parameter
max_batch_size
to 4 from 0 in my config.pbtxtI did notice that
W0125 17:09:56.807240 1 onnxruntime.cc:813] autofilled max_batch_size to 4 for model 'tglauch_classifier' since batching is supporrted but no max_batch_size is specified in model configuration. Must specify max_batch_size to utilize autofill with a larger max batch size
where the config was