huggingface / accelerate

🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
https://huggingface.co/docs/accelerate
Apache License 2.0
7.32k stars 872 forks source link

Added a MultiCPU SLURM example using Accelerate Launch and MPIRun #2902

Closed okhleif-IL closed 44 minutes ago

okhleif-IL commented 5 days ago

What does this PR do?

This PR adds a multicpu SLURM example in the examples/slurm folder using Accelerate and MPIRun. This script can run a multicpu SLURM training example on the complete_nlp_example.py script.

This PR also fixes a minor typo in all other SLURM example .sh scripts regarding activating the environment.

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.

HuggingFaceDocBuilderDev commented 1 day ago

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.