huggingface / accelerate

🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
https://huggingface.co/docs/accelerate
Apache License 2.0
7.97k stars 970 forks source link

[docs] add xpu part and fix bug in `torchrun` #3166

Closed faaany closed 3 weeks ago

faaany commented 1 month ago

What does this PR do?

This PR adds xpu and use nnodes instead of num_machines for torchrun.

@muellerzr

muellerzr commented 1 month ago

@faaany lmk if this is good to merge

HuggingFaceDocBuilderDev commented 1 month ago

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.