rosinality / vq-vae-2-pytorch

Implementation of Generating Diverse High-Fidelity Images with VQ-VAE-2 in PyTorch
Other
1.64k stars 276 forks source link

how to distributed train? #67

Open Dududu233 opened 3 years ago

Dududu233 commented 3 years ago

I have tried run 'python tain_vqvae.py --path '\home\lab\ffhq_dataset' 'in terminal, but there is a error 'module 'torch.distributed' has no ttributed 'launch' '. I read some other distributed training examples, and I didn't find such a usage for distributed:"dist.launch(main, args.n_gpu, 1, 0, args.dist_url, args=(args,))" .They just run 'python -m distributed.launch script.py 'in terminal. What's the wrong and how can I fix it? Looking forward to your response.

Dududu233 commented 3 years ago

By the way, I use python3.7 pytorch1.1.0 and cuda 9.0.

Dududu233 commented 3 years ago

I found more functions not in module'distributed', such as dist.is_primary. Is this a function written by yourself? What's the porpose of these functions?

rosinality commented 3 years ago

It is in the https://github.com/rosinality/vq-vae-2-pytorch/tree/master/distributed. I don't know why torch.distributed is used, instead of this.

Dududu233 commented 3 years ago

Thank you for your response. The problem is solved.

berryweinst commented 2 years ago

@Dududu233 , how did you solve the problem?