Closed zjuPeco closed 1 year ago
It's ok to train on one gpu, but hangs at dist.barrier(device_ids=[torch.cuda.current_device()]) on two gpus.
dist.barrier(device_ids=[torch.cuda.current_device()])
Any advice?
Solved. it's the problem of my machine.
It's ok to train on one gpu, but hangs at
dist.barrier(device_ids=[torch.cuda.current_device()])
on two gpus.Any advice?