Closed DachengLi1 closed 1 year ago
Hi, apologies I didn't include the distributed training command, this PR adds it.
It should work if you add python -m torch.distributed.launch \--nproc_per_node number_of_gpus
to the train command. let me know if you still have issues.
Did that work for you? If so we can probably close the issue
Hi there, thanks a lot for the great script. However, I got a weird behavior that setting batch size is equal to setting num gpus, i.e. when I set batch_size=2, I use 2GPU; when I set batch_size=4, I use 4GPU, despite I have set all 4 GPUs visible by Pytorch. Have you met similar issue before? Thanks!