kserve / modelmesh-serving

Controller for ModelMesh
Apache License 2.0
200 stars 113 forks source link

[Need Help] Model orchestration documentation #102

Open Phelan164 opened 2 years ago

Phelan164 commented 2 years ago

Overview

Describe the goal or feature or two, usually in the form of a user story. As a user, I want to modelmesh help me automatically and efficiently orchestrate models into available runtime servers, so that I no need to care about where model will be placed

Acceptance Criteria

Questions

Assumptions

Reference

pvaneck commented 2 years ago

Hey @Phelan164, thanks for trying out ModelMesh!

The number of runtime deployments are determined by a podsPerRuntime configuration setting, and aren't currently scaled dynamically. More info about this setting and scaling can be found here.

For model placement, first ModelMesh finds a ServingRuntime that has a compatible model type/format in its SupportedModelFormat list. Which pod of the Selected ServingRuntime deployment is selected for model placement is generally determined by a few factors such as pod request load and cache age. ModelMesh also handles how many copies of a model are loaded, where recently used models have at least two copies loaded. This number can scale up or down based on usage.

Some useful links might be: https://developer.ibm.com/blogs/kserve-and-watson-modelmesh-extreme-scale-model-inferencing-for-trusted-ai/ https://www.youtube.com/watch?v=rmYXPlzU4H8

Nagarajj commented 2 years ago

@pvaneck we have an use case where we have a stateful model (continuos learning based on feedback), for that case is it possible to restrict a model to a single pod. Is there a config to control that ?

We are not concerned about load on single model as these are individual user personal models.

Thanks for the great project, this is definitely useful initiative.