For the TensoRT backend, if my input dimension is [-1,-1, 7], the maximum batch is 2
Now I start a service with tritonserver
When I make a request, the dimension of one request is [1,4, 7], the dimension of the other request is [1,6, 7],
This kind of situation can not be Batch, can only be handled one by one.
batch is automatically formed only when the last two dimensions are the same
At this time, I only change the last two dimensions to the same in the preprocessing, so that they can Batch infer
For the TensoRT backend, if my input dimension is [-1,-1, 7], the maximum batch is 2 Now I start a service with tritonserver
When I make a request, the dimension of one request is [1,4, 7], the dimension of the other request is [1,6, 7], This kind of situation can not be Batch, can only be handled one by one. batch is automatically formed only when the last two dimensions are the same
At this time, I only change the last two dimensions to the same in the preprocessing, so that they can Batch infer