huggingface / accelerate

🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
https://huggingface.co/docs/accelerate
Apache License 2.0
7.76k stars 941 forks source link

Feature request: support for multi-node, multi-GPU distributed inference #1890

Open runopti opened 1 year ago

runopti commented 1 year ago

Motivation

Currently, device_map="auto" only support a single-node, multi-GPU setup (https://github.com/huggingface/transformers/issues/24747). If you have access to 8xA100 80GB/40GB, things are mostly fine, but not everyone has access to such a setup. Therefore, I believe there is benefit to support multi-node, multi-gpu inference. For example, if someone has access to multi-node 4xV100 setup but not 8xA100, then this feature would allow them to experiment large models like Llama-2-80B on their multi-node 4xV100 setup. As far as I know, there doesn't appear to be a straightforward method for running inference with models of the scale of Llama-2-80B on multi-node V100 setups.

SunMarc commented 1 year ago

Hi @runopti, thanks for the suggestion! It's fairly niche and difficult to get. If there is a lot of need for this feature (react with 👍 ), we will prioritize it .

ericbugin commented 1 year ago

I really need multi-node as well because I work on Dataproc and EMR clusters

ansSanthoshM commented 7 months ago

I am also looking for this feature. Need to set up inference server for multi Node multi GPU.

UmutAlihan commented 5 months ago

I am also very looking forward for this feature. Multi-Node multi-GPU inference will immense upper-hand for devs who have many small-size cheap GPUs connected to same network through different bare metals in the same room or maybe even from different parts of the world through private network. Possibilities are infinite.

jacklanda commented 4 months ago

ditto

zzhoo8 commented 4 months ago

up up