Open fahimkm opened 1 year ago
It looks like Seldon says that Triton supports multi-model serving, which it does. That's out of the box just by loading multiple models onto Triton and our basic documentation covers that.
Overcommitting is not supported by Triton. You can see how model management works here. What you're talking about could be accomplished using EXPLICIT mode if you create the logic for loading and unloading models as needed.
I'm going to mark this as an enhancement. We've filed a ticket to investigate adding this feature.
Is your feature request related to a problem? Please describe. I read in the seldon core documentation that multi-model serving with overcommit is available out of the box on nvidia triton https://docs.seldon.io/projects/seldon-core/en/v2/contents/models/mms/mms.html?highlight=multi%20modal%20serving
Describe the solution you'd like Can you please share documentation on how to configure and implement multi-model serving with overcommit using Nvida Triton?