microsoft / Oscar

Oscar and VinVL
MIT License
1.04k stars 252 forks source link

fix the deadlock problem when using distributed training in VQA fintune #197

Open Light-V opened 2 years ago

Light-V commented 2 years ago

When using distributed training, the process with local_rank!=0 will not call torch.distributed.barrier() and cause a deadlock.