load initial handlers in parallel

pytorch / serve

Serve, optimize and scale PyTorch models in production

https://pytorch.org/serve/

Apache License 2.0

4.23k stars 863 forks source link

load initial handlers in parallel #1928

Open hgong-snap opened 2 years ago

hgong-snap commented 2 years ago

🚀 The feature

it seems that initial handlers are loaded sequentially for different models(handlers for same model are loaded in parallel though). When serving many models in production, this will significantly slowdown the new server spinning up. If it is possible to load all handlers in parallel? e.g. for a 32 core machine, on server startup, ideally we should process 32 workers in parallel in startup. This will dramatically decrease the startup time and can scale up better during traffic surge.

Motivation, pitch

see above

Alternatives

No response

Additional context

No response

lxning commented 2 years ago

@hgong-snap Currently, TorchServe support model loading in 2 ways:

load models during TorchServe initialization. In this case, models from model store are loaded sequentially. I agree parallel model loading at Torchserve startup time is very useful to reduce the initialization latency. We will add this feature request in backlog.
register models via REST or gRPC API. In this case, model are loaded parallel.

The short-term workaround for your production is

start Torchserve w/o model loading torchserve --ncs --start --model-store model_store
write a script to send model registration request to Torchserve concurrently.

hgong-snap commented 2 years ago

thanks @lxning curious is it possible to prioritize this? This seems limiting the ability to serve a large number(>100) of models. custom script is a good workaround, but it is just not not a scalable way

msaroufim commented 2 years ago

Hi @hgong-snap we're planning a release mid this month and can prioritize this ask right after that

hgong-snap commented 2 years ago

thanks @msaroufim Please prioritize after the release, and please let me know when this is implemented. thanks!

msaroufim commented 2 years ago

Sounds good cc @lxning for visibility