pytorch / serve

Serve, optimize and scale PyTorch models in production
https://pytorch.org/serve/
Apache License 2.0
4.23k stars 863 forks source link

load initial handlers in parallel #1928

Open hgong-snap opened 2 years ago

hgong-snap commented 2 years ago

🚀 The feature

it seems that initial handlers are loaded sequentially for different models(handlers for same model are loaded in parallel though). When serving many models in production, this will significantly slowdown the new server spinning up. If it is possible to load all handlers in parallel? e.g. for a 32 core machine, on server startup, ideally we should process 32 workers in parallel in startup. This will dramatically decrease the startup time and can scale up better during traffic surge.

Motivation, pitch

see above

Alternatives

No response

Additional context

No response

lxning commented 2 years ago

@hgong-snap Currently, TorchServe support model loading in 2 ways:

The short-term workaround for your production is

hgong-snap commented 2 years ago

thanks @lxning curious is it possible to prioritize this? This seems limiting the ability to serve a large number(>100) of models. custom script is a good workaround, but it is just not not a scalable way

msaroufim commented 2 years ago

Hi @hgong-snap we're planning a release mid this month and can prioritize this ask right after that

hgong-snap commented 2 years ago

thanks @msaroufim Please prioritize after the release, and please let me know when this is implemented. thanks!

msaroufim commented 2 years ago

Sounds good cc @lxning for visibility