dist training spwn no rank args

lucidrains / lightweight-gan

Implementation of 'lightweight' GAN, proposed in ICLR 2021, in Pytorch. High resolution image generations that can be trained within a day or two

MIT License

1.63k stars 222 forks source link

dist training spwn no rank args #114

Closed jayagami closed 2 years ago

jayagami commented 2 years ago

The training process looks fine, but I noticed that a parameter rank is missing from the line 185, mp.spawn run_training, is this correct?

https://github.com/lucidrains/lightweight-gan/blob/b7c34d587d029177ddc641f42b2604506352dfb2/lightweight_gan/cli.py#L180-L187

iScriptLex commented 2 years ago

Yes, it's correct. When torch.multiprocessing.spawn calls run_training, it's called in the format run_training(ind, *args), where ind is the process index (see pytorch documentation: https://pytorch.org/docs/stable/multiprocessing.html ). That's why the function has one more parameter than the number of arguments passed to it in line 185.

jayagami commented 2 years ago

Yes, it's correct. When torch.multiprocessing.spawn calls run_training, it's called in the format run_training(ind, *args), where ind is the process index (see pytorch documentation: https://pytorch.org/docs/stable/multiprocessing.html ). That's why the function has one more parameter than the number of arguments passed to it in line 185.

Thanks for your replying. Sorry for I missed it, I've been a little distracted lately 😥