Closed Liu-yuliang closed 5 months ago
Hello, thank you for your good work I use the following bash script
--batch-size 1 \ --gradient-accumulate-every 48 \
and this single_node.yaml
num_machines: 1 num_processes: 2
I want to know whether the global training step is 48 or 96 with seq parallel in your dist_flash_attn
dist_flash_attn
sorry the global training step -> the global training batch size
solved
Hello, thank you for your good work I use the following bash script
and this single_node.yaml
I want to know whether the global training step is 48 or 96 with seq parallel in your
dist_flash_attn