triton-inference-server / tensorrtllm_backend

The Triton TensorRT-LLM Backend
Apache License 2.0
704 stars 104 forks source link

The Triton server does not enable In-Flight Batch, and Dynamic Batch does not take effect. #239

Open StarrickLiu opened 10 months ago

StarrickLiu commented 10 months ago

According to the instructions, uncommenting should enable dynamic batching, but despite uncommenting, it has not taken effect.

企业微信截图_17029850515536

The Triton server and Engine are both configured with a BatchSize of 128. In practical testing, the Triton server is capable of handling requests with BatchSize=128.

shannonphu commented 10 months ago

How did figure out that dynamic batching did not happen?

juney-nvidia commented 10 months ago

@StarrickLiu

Can you share more details that "The Triton server does not enable In-Flight Batch, and Dynamic Batch does not take effect"?

Based on your description, it is hard for us to take any actions.

June