lucidrains / lightweight-gan

Implementation of 'lightweight' GAN, proposed in ICLR 2021, in Pytorch. High resolution image generations that can be trained within a day or two
MIT License
1.63k stars 222 forks source link

Bug fix/fake progress #91

Closed anomal closed 3 years ago

anomal commented 3 years ago

Fix bug where positive progress is displayed, even when the reality is an infinite crash loop caused by NanException thrown before Checkpoint 1.

Example: save_every = 1000, NanException is thrown around iteration 700. Steps will keep increasing to the thousands and progress bar displays the training getting closer to completion, but the model never gets close to converging because of the infinite crash loop before Checkpoint 1.

anomal commented 3 years ago

To quickly simulate NanException being thrown, temporarily change the line:

https://github.com/lucidrains/lightweight-gan/blob/169e0698f458972b7fd5d78f5b4c50fb54d041ce/lightweight_gan/lightweight_gan.py#L1199

to

        if (self.steps > 1 and self.steps % 51 == 0) or any(torch.isnan(l) for l in (total_gen_loss, total_disc_loss)):