triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.38k stars 1.49k forks source link

About automatic Batch #7738

Closed CallmeZhangChenchen closed 4 weeks ago

CallmeZhangChenchen commented 1 month ago

For the TensoRT backend, if my input dimension is [-1,-1, 7], the maximum batch is 2 Now I start a service with tritonserver

When I make a request, the dimension of one request is [1,4, 7], the dimension of the other request is [1,6, 7], This kind of situation can not be Batch, can only be handled one by one. batch is automatically formed only when the last two dimensions are the same

At this time, I only change the last two dimensions to the same in the preprocessing, so that they can Batch infer

CallmeZhangChenchen commented 4 weeks ago

记录一下, 验证了,shape 不一样 是可以自动组 batch 的