What is the expected "global batch size"? - Githubissues

huggingface / alignment-handbook

Robust recipes to align language models with human and AI preferences

https://huggingface.co/HuggingFaceH4

Apache License 2.0

4.28k stars 367 forks source link

What is the expected "global batch size"? #50

Closed ohmeow closed 8 months ago

ohmeow commented 8 months ago

In the recipes README there is this statement:

If you scale up/down the number of GPUs, we recommend also scaling up the per-device batch size or number of gradient accumulation steps to keep the global batch size constant (and thus replicate our results).

Q: What is the expected "global batch size"?

For example, I'm trying to run this on 2x3090s and need to know what the expected global batch size is so I can adjust the accumulation steps and per device train batch size.

Thanks much!

timothylimyl commented 8 months ago

@ohmeow

For SFT, it's 512.

For DPO, it's 32.