jzhang38 / EasyContext

Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.
Apache License 2.0
653 stars 47 forks source link

about seq parallel global batch size #32

Closed Liu-yuliang closed 5 months ago

Liu-yuliang commented 6 months ago

Hello, thank you for your good work I use the following bash script

--batch-size 1 \
--gradient-accumulate-every 48  \

and this single_node.yaml

num_machines: 1
num_processes: 2

I want to know whether the global training step is 48 or 96 with seq parallel in your dist_flash_attn

Liu-yuliang commented 6 months ago

sorry the global training step -> the global training batch size

Liu-yuliang commented 5 months ago

solved