Excellent work! : )
But I got a bug. When I use the multi-GPU run the first_stage code, my code was block up at this line. I find the issue is induced by model desynchronization.
criterion = torch.nn.parallel.DistributedDataParallel(criterion, device_ids=[device], broadcast_buffers=False, find_unused_parameters=True)
Excellent work! : ) But I got a bug. When I use the multi-GPU run the first_stage code, my code was block up at this line. I find the issue is induced by model desynchronization.
criterion = torch.nn.parallel.DistributedDataParallel(criterion, device_ids=[device], broadcast_buffers=False, find_unused_parameters=True)