about seq parallel global batch size

jzhang38 / EasyContext

Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.

Apache License 2.0

529 stars 33 forks source link

Closed Liu-yuliang closed 2 weeks ago

Liu-yuliang commented 1 month ago

Hello, thank you for your good work I use the following bash script

--batch-size 1 \
--gradient-accumulate-every 48  \

and this single_node.yaml

num_machines: 1
num_processes: 2

I want to know whether the global training step is 48 or 96 with seq parallel in your dist_flash_attn

Liu-yuliang commented 1 month ago

sorry the global training step -> the global training batch size

Liu-yuliang commented 2 weeks ago

solved