Open artucalvo opened 3 years ago
This happens when you use the AMP flag, I found the same happened in the StyleGAN2-PyTorch implementation when using the fp16 flag there, so seems that the models collapse quite quickly after initializing. Works just fine when omitting AMP, albeit slower and more memory intensive.
I get the same problem without AMP; then it tries to load always from checkpoint 0 (I think this may be a logging error). The GAN in general seems to be highly unstable.
I have tried running the algorithm on Colab with different datasets (256, 512px), batch sizes (16, 32), aug probabilities (0.25, 0.40) and gradient_accumulate_every (4, 2, 1). However, I always get stuck in less than 1 hour into the NaN loop.
This is one execution example, where GP quickly gets to 10.00. Any thoughts on what is going on?