huggingface / accelerate

🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
https://huggingface.co/docs/accelerate
Apache License 2.0
7.97k stars 970 forks source link

Give example on how to handle gradient accumulation with cross-entropy #3193

Open ylacombe opened 4 weeks ago

ylacombe commented 4 weeks ago

What does this PR do?

Following the recent highlights on how gradient accumulation with the cross-entropy loss is usually off, it could be great to have it mentioned in the doc. I've thus added some code and explanation of it in the gradient accumulation page.

cc @SunMarc and @muellerzr, let me know what you think of it or if I can make this any clearer!

HuggingFaceDocBuilderDev commented 4 weeks ago

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.