When I increase the accumulate_gradient_steps, can the batch_size also be increased accordingly?

young-geng / EasyLM

Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Flax.

Apache License 2.0

2.38k stars 254 forks source link

When I increase the accumulate_gradient_steps, can the batch_size also be increased accordingly? #30

Closed joytianya closed 1 year ago

joytianya commented 1 year ago

if it is normal ... --train_dataset.json_dataset.batch_size=4 --optimizer.bf16_accumulate_gradient=True \ --optimizer.accumulate_gradient_steps=1 \ ...

so is it right? ... --train_dataset.json_dataset.batch_size=8 --optimizer.bf16_accumulate_gradient=True \ --optimizer.accumulate_gradient_steps=2 \ ...

young-geng commented 1 year ago

The effective batch size after accumulating the gradient is accumulate_gradient_steps * batch_size, so you don't need to change the batch_size