triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.4k stars 1.49k forks source link

Does Triton support multiple TensorFlow backends simultaneously? #7698

Open ragavendrams opened 1 month ago

ragavendrams commented 1 month ago

I would like to know if Triton supports multiple Tensorflow backends at the same time (e.g Tensorflow 2.13 and 2.16).

Use case: I have an application whose v1 requires Tensorflow 2.13 and v2 requires Tensorflow 2.16. Both versions of the application are in production (using a different inference server) and I would like to support both using one triton server instance to prevent having to allocate multiple GPUs (i.e one for a triton instance with Tensorflow 2.13 backend and another for a triton instance with Tensorflow 2.16 backend).

Known solution: I have read about Multi Instance GPUs which can can be used to split the GPU and allocate one to each instance of Triton. But this is not supported in all NVIDIA GPUs (Eg: 2080Ti). So I would like to explore other options.

Is this possible?

Thanks in advance!