Serving large models on multiple GPUs

yurkoff-mv commented 1 year ago

🚀 The feature

How to deploy a model service that spans multiple GPUs?

Motivation, pitch

I have a large model which I run via torchrun. I use the FairScale library to distribute the model. How can I make friends with the TorchServe? I need to initialize the model with weights, distribute it across multiple GPUs, and make it wait for a batch to infer. But torchrun is only capable of running a function with deterministic parameters for execution. How can I write a handler in this case?

Alternatives

No response

Additional context

No response

lxning commented 1 year ago

@yurkoff-mv TorchServe will provide a new feature "TorchServe Open Platform for Large Distributed Model Inference" soon. It is able to address your questions.

lxning commented 1 year ago

@yurkoff-mv torchrun was integrated into TorchServe. here is an example PR

yurkoff-mv commented 1 year ago

Thank you! Unfortunately, I cannot view these files. When will they be publicly available? And another question, is it planned to support the use of GPUs located on several nodes (multi-node)?

lxning commented 1 year ago

@yurkoff-mv here is the link of deepspeed example.

To support multi-node, a controller will be added later.

pytorch / serve