Open tylertitsworth opened 9 months ago
Hi @tylertitsworth In the model-store directory, you can have multiple models right. So, how do you decide what to load/unload. If there are 3 models for example, and I delete one, which of the 2 remaining would get loaded?
Do you an example of how this would be used?
Assuming we are re-using the existing registration system, the model's archive file path would be stored in the registered model's metadata.
As such, a deletion results in a check when found during polling that finds any models with that stored file path metadata and unregisters the models associated with that deleted file path. Rather than reloading all of the models in the store.
This is the same functionality I have experienced with the Triton Inference Server.
🚀 The feature
A Polling mode similar to Triton Inference Server that checks for when the model-store has changed, and loads/unloads
.mar
files as they are added/removed from the store. Removing the need for a service to register models as they are created in an MLOps Pipeline.Motivation, pitch
Triton Inference Server is the most popular Serving Technology on the market. Many users who would be using TorchServe instead choose Triton Inference Server because of this feature. It reduces Ops overhead and makes the Serving platform more autonomous.
Alternatives
No response
Additional context
In this example, a torchscript model is created and used in the example for benchmarking. By adding
--model-control-mode=poll
to L43 ofstart.sh
we can manipulate the server's registered models by creating a new directory2
and copyingmodel.py
into the same directory. Then watch as the logs remove v1, and add v2. Similarly, if I delete the directory2
it removes v2 and re-registers v1. This is the functionality that I'd like achieved.