Triton considers max_batch_size as a number of channels for a given input image

Description I'm having a strange issue with integrating a tensorrt model into Triton. When I retrieve the model configuration, I see that the max_batch_size is being considered as the number of channels for a 3*H*W image input. For example, Triton returns this configuration for a C*H*W image: max_batch_size=C=3 and dims= H*W. I want to note that the model works fine in the Python environment and I have already received correct results from it.

Triton Information

$ curl -v localhost:8000/v2
Connection #0 to host localhost left intact
{"name":"triton","version":"2.46.0","extensions":["classification","sequence","model_repository","model_repository(unload_dependents)","schedule_policy","model_configuration","system_shared_memory","cuda_shared_memory","binary_tensor_data","parameters","statistics","trace","logging"]}

Are you using the Triton container or did you build it yourself? Just use it with no modifications

To Reproduce curl localhost:8000/v2/models/txspot/config

For the above example, the dims should have been 3*1152*2048 and max_batch_size=1, while Triton returned max_batch_size=3 and dims= 1152*2048

Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well).

the config.pbtxt is:

name: "txspot"
platform: "tensorrt_plan"
max_batch_size: 0
default_model_filename: "./models/txtspotting_r50_trt86_v0.1.1_2K.engine"

Expected behavior The dimensions should have been C*H*W, but Triton considers the number of channels (C) as the max_batch_size and the dimensions as H*W. So the max_batch_size is 3, which is equal to C.

triton-inference-server / server

Triton considers max_batch_size as a number of channels for a given input image #7450