Open Phelan164 opened 2 years ago
Hey @Phelan164, thanks for trying out ModelMesh!
The number of runtime deployments are determined by a podsPerRuntime
configuration setting, and aren't currently scaled dynamically. More info about this setting and scaling can be found here.
For model placement, first ModelMesh finds a ServingRuntime that has a compatible model type/format in its SupportedModelFormat list. Which pod of the Selected ServingRuntime deployment is selected for model placement is generally determined by a few factors such as pod request load and cache age. ModelMesh also handles how many copies of a model are loaded, where recently used models have at least two copies loaded. This number can scale up or down based on usage.
Some useful links might be: https://developer.ibm.com/blogs/kserve-and-watson-modelmesh-extreme-scale-model-inferencing-for-trusted-ai/ https://www.youtube.com/watch?v=rmYXPlzU4H8
@pvaneck we have an use case where we have a stateful model (continuos learning based on feedback), for that case is it possible to restrict a model to a single pod. Is there a config to control that ?
We are not concerned about load on single model as these are individual user personal models.
Thanks for the great project, this is definitely useful initiative.
Overview
Describe the goal or feature or two, usually in the form of a user story. As a user, I want to modelmesh help me automatically and efficiently orchestrate models into available runtime servers, so that I no need to care about where model will be placed
Acceptance Criteria
Questions
Assumptions
Reference