stuck in dist_util.setup_dist() when trying multiGPUs training.

openai / guided-diffusion

MIT License

6.03k stars 803 forks source link

stuck in dist_util.setup_dist() when trying multiGPUs training. #63

Closed fido20160817 closed 2 years ago

fido20160817 commented 2 years ago

I use python -m torch.distributed.launch --nproc_per_node=4 ... to train in a multiGPU way, but it is stuck in "dist_util.setup_dist()" which is the first sentence of main. Any body know the reason?

fido20160817 commented 2 years ago

remember to put model on cuda firstly.