Open hgong-snap opened 2 years ago
@hgong-snap Currently, TorchServe support model loading in 2 ways:
load models during TorchServe initialization. In this case, models from model store are loaded sequentially. I agree parallel model loading at Torchserve startup time is very useful to reduce the initialization latency. We will add this feature request in backlog.
register models via REST or gRPC API. In this case, model are loaded parallel.
The short-term workaround for your production is
start Torchserve w/o model loading torchserve --ncs --start --model-store model_store
write a script to send model registration request to Torchserve concurrently.
thanks @lxning curious is it possible to prioritize this? This seems limiting the ability to serve a large number(>100) of models. custom script is a good workaround, but it is just not not a scalable way
Hi @hgong-snap we're planning a release mid this month and can prioritize this ask right after that
thanks @msaroufim Please prioritize after the release, and please let me know when this is implemented. thanks!
Sounds good cc @lxning for visibility
🚀 The feature
it seems that initial handlers are loaded sequentially for different models(handlers for same model are loaded in parallel though). When serving many models in production, this will significantly slowdown the new server spinning up. If it is possible to load all handlers in parallel? e.g. for a 32 core machine, on server startup, ideally we should process 32 workers in parallel in startup. This will dramatically decrease the startup time and can scale up better during traffic surge.
Motivation, pitch
see above
Alternatives
No response
Additional context
No response