triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.23k stars 1.47k forks source link

Triton considers max_batch_size as a number of channels for a given input image #7450

Open 12sf12 opened 3 months ago

12sf12 commented 3 months ago

Description I'm having a strange issue with integrating a tensorrt model into Triton. When I retrieve the model configuration, I see that the max_batch_size is being considered as the number of channels for a 3*H*W image input. For example, Triton returns this configuration for a C*H*W image: max_batch_size=C=3 and dims= H*W. I want to note that the model works fine in the Python environment and I have already received correct results from it.

Triton Information

$ curl -v localhost:8000/v2
Connection #0 to host localhost left intact
{"name":"triton","version":"2.46.0","extensions":["classification","sequence","model_repository","model_repository(unload_dependents)","schedule_policy","model_configuration","system_shared_memory","cuda_shared_memory","binary_tensor_data","parameters","statistics","trace","logging"]}

Are you using the Triton container or did you build it yourself? Just use it with no modifications

To Reproduce curl localhost:8000/v2/models/txspot/config

image

For the above example, the dims should have been 3*1152*2048 and max_batch_size=1, while Triton returned max_batch_size=3 and dims= 1152*2048

Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well).

the config.pbtxt is:

name: "txspot"
platform: "tensorrt_plan"
max_batch_size: 0
default_model_filename: "./models/txtspotting_r50_trt86_v0.1.1_2K.engine"

Expected behavior The dimensions should have been C*H*W, but Triton considers the number of channels (C) as the max_batch_size and the dimensions as H*W. So the max_batch_size is 3, which is equal to C.

sourabh-burnwal commented 3 months ago

@12sf12 how did you export this model? Can you share the trtexec command?

Also, can you share complete config.pbtxt for this model? If you are passing max_batch_size as 0, the batch size has to be mentioned in the tensor definition.