pytorch / serve

Serve, optimize and scale PyTorch models in production
https://pytorch.org/serve/
Apache License 2.0
4.22k stars 860 forks source link

Polling Mode #2914

Open tylertitsworth opened 9 months ago

tylertitsworth commented 9 months ago

🚀 The feature

A Polling mode similar to Triton Inference Server that checks for when the model-store has changed, and loads/unloads .mar files as they are added/removed from the store. Removing the need for a service to register models as they are created in an MLOps Pipeline.

Motivation, pitch

Triton Inference Server is the most popular Serving Technology on the market. Many users who would be using TorchServe instead choose Triton Inference Server because of this feature. It reduces Ops overhead and makes the Serving platform more autonomous.

Alternatives

No response

Additional context

In this example, a torchscript model is created and used in the example for benchmarking. By adding --model-control-mode=poll to L43 of start.sh we can manipulate the server's registered models by creating a new directory 2 and copying model.py into the same directory. Then watch as the logs remove v1, and add v2. Similarly, if I delete the directory 2 it removes v2 and re-registers v1. This is the functionality that I'd like achieved.

agunapal commented 9 months ago

Hi @tylertitsworth In the model-store directory, you can have multiple models right. So, how do you decide what to load/unload. If there are 3 models for example, and I delete one, which of the 2 remaining would get loaded?

Do you an example of how this would be used?

tylertitsworth commented 9 months ago

Assuming we are re-using the existing registration system, the model's archive file path would be stored in the registered model's metadata.

As such, a deletion results in a check when found during polling that finds any models with that stored file path metadata and unregisters the models associated with that deleted file path. Rather than reloading all of the models in the store.

This is the same functionality I have experienced with the Triton Inference Server.