taki0112 / StyleGAN-Tensorflow

Simple & Intuitive Tensorflow implementation of StyleGAN (CVPR 2019 Oral)
MIT License
211 stars 60 forks source link

Discriminator getting too good approaching last upsamples. #9

Open rfitz123 opened 4 years ago

rfitz123 commented 4 years ago

My results were near perfect up until 256x256. They were looking really really good, but I could see the discriminator slowly pulling ahead of the generator. By 512x512, the images were turning into blobby shapes and the losses were averaging something like D: 0.05 G: 8.

Let it be known that I am training on shoes rather than faces, so I don't expect it to work as good as a perfect and huge dataset. However, any ideas on how to combat this?

I have two ideas. My first idea is to introduce more noise. I have found the existing noise function, but I am not sure how to adjust it to increase the magnitude of noise. I am going to try doubling the noise by changing x = x + noise * weight to x = x + noise * weight * 2. I can't tell if this is even the right way to modify that.

My second idea is to train the generator more times per iteration than the discriminator. I know that is usually the opposite of what usually works, but it seems my model needs that. Has anybody tried this, and if so, how did it go?

I am pretty confident in the model's ability to do better because although my dataset is small, the images are perfect formatted. High quality, shoe is on a white background, same positioning and orientation for all of them.

Thanks.

aydao commented 4 years ago

You are suffering divergence with the losses you report, making it unlikely to recover. Hopefully you have an earlier checkpoint around and can resume training from a point that was more balanced/stable.

I have also found that D outpaces G quite frequently in StyleGAN. I apply the simple heuristic of lowering D's learning rate, usually setting it to be around one third of G's learning rate, though you may need to tweak it for your domain.

rfitz123 commented 4 years ago

When adjusting D's learning rate, I noticed that the lr increases as it upsamples. Aren't learning rates usually supposed to decrease as it gets closer to convergence?

Regardless, I've set the discriminator to update 1/4 of the amount compared to the generator and stopped the D's lr from increasing, so it will be about 1/3 to 1/2 the lr of G. Will update with results for others.

aydao commented 4 years ago

Yes, typically you would want to decrease the learning rate gradually to converge to a local minimum. However, the original StyleGAN code specifies multiple learning rates that increase during progressive growing. The StyleGAN and earlier ProGAN papers discuss various reasons for their selection of learning rate.

Here's how I view it. At earlier levels, you train faster on the smaller resolutions and can afford a tiny learning rate. Training slows down as you progressively upsample. Given you are training on the same GPU resources regardless of the level of resolution, it makes sense practically in terms of time/compute to increase the LR and combat the overall slowdown in learning. The risk is converging to a less-than-ideal point, though LR-balancing in GANs is a dark art anyway, without an accepted best practice or theory in literature. Anecdotally, the Nvidia authors found 0.003 to work well at the higher resolutions, yet in my models (with much less high quality data), I found training with 0.0003 to be more reasonable.

rfitz123 commented 4 years ago

@aydao LR of 0.0003 for the discriminator or the generator?

aydao commented 4 years ago

Both, or with the generator set to 0.0003 and the discriminator even lower. Note that I've found it important to keep the learning rates at 0.003 initially for transfer learning, for example if you are fine-tuning from FFHQ to a new dataset, until results start looking reasonable, and then it helps to lower the learning rates to learn fine detail.