Open shreypandey opened 2 years ago
We already have a ticket filed for this enhancement. https://github.com/triton-inference-server/server/issues/3286
A nice addition to this feature would be to also provide an option to pad the batched request, to avoid a recompilation of the model (when using jit.trace or pytorch.compile for example) in case it encounters an incomplete batch
Is your feature request related to a problem? Please describe. Triton python backend should provide dynamic batching just like other backends supported by triton. For eg. For the model config mentioned below
Inputs for pytorch/tensorflow/ONNX backends will be of shape
[k, 81]
wherek
is batch size calculated by triton dynamic batching. Whereas Inputs for python backend is a pythonlist
object of lengthk
, where each element is of typepb_utils.InferenceRequest
object which contains array of shape[1, 81]
.Describe the solution you’d like Triton backend should provide inputs as an array with requests batched along the batch axis as in the case of other backends.
Describe alternatives you’ve considered A clear and concise description of any alternative solutions or features you’ve considered.
Additional context Add any other context or screenshots about the feature request here.