fix the deadlock problem when using distributed training in VQA fintune

microsoft / Oscar

Oscar and VinVL

MIT License

1.04k stars 252 forks source link

Open Light-V opened 2 years ago

Light-V commented 2 years ago

When using distributed training, the process with local_rank!=0 will not call torch.distributed.barrier() and cause a deadlock.