Support unbatched models in DALI backend

triton-inference-server / dali_backend

The Triton backend that allows running GPU-accelerated data pre-processing pipelines implemented in DALI's python API.

https://docs.nvidia.com/deeplearning/dali/user-guide/docs/index.html

MIT License

120 stars 28 forks source link

Support unbatched models in DALI backend #168

Closed banasraf closed 1 year ago

banasraf commented 1 year ago

This PR enables unbatched models support in DALI backend and creates a special execution path for such models.

Unbatched model is a model with max_batch_size set to 0. In such case Triton does not interpret the first dimension of tensors as batch_size, which e.g. disables the dynamic batching.

This mode is best fitting for our streamed video use-case because we in that case we always want to handle requests one-by-one.

This required changes in the config validation/autofill (config_tools/*) and separate, simplified ExecuteUnbatched method in DaliModelInstance.

Signed-off-by: Rafal rbanas@nvidia.com