[Need Help] Model orchestration documentation

Phelan164 commented 2 years ago

Overview

Describe the goal or feature or two, usually in the form of a user story. As a user, I want to modelmesh help me automatically and efficiently orchestrate models into available runtime servers, so that I no need to care about where model will be placed

Acceptance Criteria

Questions

Sorry for ask this, I am new here and interested this project to do model orchestration. I am not sure now modelmesh already supported or not. I try to find some tutorial documents but still not found anything instead of deploying single model. If it already support, could you help me pin down some documents or guidelines. I have some questions:
when it provision more runtime server?
how it find suitable runtime server to deploy
how about scalability (I tried with 2 triton runtime server and deploy few models, then I checked there are some model weights downloaded in both triton runtime server and serve) Sorry for my lacking understanding Thanks!

Assumptions

Reference

pvaneck commented 2 years ago

Hey @Phelan164, thanks for trying out ModelMesh!

The number of runtime deployments are determined by a podsPerRuntime configuration setting, and aren't currently scaled dynamically. More info about this setting and scaling can be found here.

For model placement, first ModelMesh finds a ServingRuntime that has a compatible model type/format in its SupportedModelFormat list. Which pod of the Selected ServingRuntime deployment is selected for model placement is generally determined by a few factors such as pod request load and cache age. ModelMesh also handles how many copies of a model are loaded, where recently used models have at least two copies loaded. This number can scale up or down based on usage.

Some useful links might be: https://developer.ibm.com/blogs/kserve-and-watson-modelmesh-extreme-scale-model-inferencing-for-trusted-ai/ https://www.youtube.com/watch?v=rmYXPlzU4H8

Nagarajj commented 2 years ago

@pvaneck we have an use case where we have a stateful model (continuos learning based on feedback), for that case is it possible to restrict a model to a single pod. Is there a config to control that ?

We are not concerned about load on single model as these are individual user personal models.

Thanks for the great project, this is definitely useful initiative.

kserve / modelmesh-serving