🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
This PR adds a multicpu SLURM example in the examples/slurm folder using Accelerate and MPIRun. This script can run a multicpu SLURM training example on the complete_nlp_example.py script.
This PR also fixes a minor typo in all other SLURM example .sh scripts regarding activating the environment.
Before submitting
[x] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.
What does this PR do?
This PR adds a multicpu SLURM example in the
examples/slurm
folder using Accelerate and MPIRun. This script can run a multicpu SLURM training example on thecomplete_nlp_example.py
script.This PR also fixes a minor typo in all other SLURM example .sh scripts regarding activating the environment.
Before submitting
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.