Open Light-V opened 2 years ago
When using distributed training, the process with local_rank!=0 will not call torch.distributed.barrier() and cause a deadlock.
When using distributed training, the process with local_rank!=0 will not call torch.distributed.barrier() and cause a deadlock.