I would like to know if Triton supports multiple Tensorflow backends at the same time (e.g Tensorflow 2.13 and 2.16).
Use case:
I have an application whose v1 requires Tensorflow 2.13 and v2 requires Tensorflow 2.16. Both versions of the application are in production (using a different inference server) and I would like to support both using one triton server instance to prevent having to allocate multiple GPUs (i.e one for a triton instance with Tensorflow 2.13 backend and another for a triton instance with Tensorflow 2.16 backend).
Known solution:
I have read about Multi Instance GPUs which can can be used to split the GPU and allocate one to each instance of Triton. But this is not supported in all NVIDIA GPUs (Eg: 2080Ti). So I would like to explore other options.
I would like to know if Triton supports multiple Tensorflow backends at the same time (e.g Tensorflow 2.13 and 2.16).
Use case: I have an application whose v1 requires Tensorflow 2.13 and v2 requires Tensorflow 2.16. Both versions of the application are in production (using a different inference server) and I would like to support both using one triton server instance to prevent having to allocate multiple GPUs (i.e one for a triton instance with Tensorflow 2.13 backend and another for a triton instance with Tensorflow 2.16 backend).
Known solution: I have read about Multi Instance GPUs which can can be used to split the GPU and allocate one to each instance of Triton. But this is not supported in all NVIDIA GPUs (Eg: 2080Ti). So I would like to explore other options.
Is this possible?
Thanks in advance!