soumith / ganhacks

starter from "How to Train a GAN?" at NIPS2016
11.44k stars 1.66k forks source link

G loss increase, what is this mean? #14

Open wagamamaz opened 7 years ago

wagamamaz commented 7 years ago

Hi, I am training a conditional GAN. At the beginning, both G and D loss decrease, but around 200 epoch, G loss start to increase from 1 to 3, and the image quality seems to stop improve.

Any ideas? Thank you in advance.

zhangqianhui commented 7 years ago

It's hard to say!

LukasMosser commented 7 years ago

Ok this is for an unconditional boilerplate GAN. What I found for loss increase in G was that: a) it was accompanied by a decrease in D loss. Essentially G starts diverging. b) image quality improved subtle but it did.

vijayvee commented 7 years ago

I think the discriminator got too strong relative to the generator. Beyond this point, the generator finds it almost impossible to fool the discriminator, hence the increase in it's loss. I'm facing a similar problem.

LukasMosser commented 7 years ago

Have you tried label smoothing @vijayvee ?

vijayvee commented 7 years ago

No I haven't tried it yet @LukasMosser

dugarsumit commented 7 years ago

I am facing similar problem while training infoGAN on svhn dataset. Any suggestion on how to overcome this? infogan_loss

cianeastwood commented 7 years ago

I am also facing similar problem with Infogan on a different​ dataset. Any suggestions?

zsdonghao commented 7 years ago

In my experience, when d loss decrease to a small value (0.1 to 0.2) and g loss increase to a high value (2 to 3), it means the training finish as generator cannot be further improved.

Bit if the d loss decrease to a small value in just few epochs, it means the training fail, and you may need to check the network architecture.

ezamyatin commented 7 years ago

I have the same problem. When i train GAN, i expect that in the end of training(some infinite moment) G will always fool D. But in fact I am faced with the following problem: at the beginning of the process, G learns correctly - it learns to produce good images with nessesary conditions. But after some moment G starts to diverge. In the end, G could produce only random noise. Why this happens?

ezamyatin commented 7 years ago

Probably, the problem is that the discriminator overfit. One of the reasons leading to this is following thing: discriminator may "notice" that images from true distribution is a matrix of numbers of the form n/255. So, adding gaussian noise to the input images may help to avoid the problem. It helps in my case.

LukasMosser commented 7 years ago

Label switching has also helped for me.

Two updates of discriminator with real_label = 1, fake_label=0 and one update with real_label=0 and fake_label=1.

This is followed by one generator update with real_label = 1 and fake_label = 0.

shijx12 commented 7 years ago

Label smoothing helped for me.

Howie-hxu commented 6 years ago

Adding gaussian noise helped for me

avisekiit commented 6 years ago

@Howie-hxu and @EvgenyZamyatin : I saw that adding Gaussian noise in the discriminator helped in your case. I have few questions :

  1. What did you keep as mean and variance of the Gaussian noise
  2. Did you apply Gaussian noise in each layer of the discriminator ? Lets say, if we are using DCGAN architecture ?
  3. Do you apply the noise layer after the activation or before doing convolution?
  4. If suppose, I am using Tensorflow, how do you implement that?

Keenly waiting for your help !!! Thanks, Avisek

SHANKARMB commented 6 years ago

Same doubt here

17Skye17 commented 6 years ago

Same doubts as yours. @avisekiit

ahmed-fau commented 6 years ago

I have used the idea of instance noise described here. My experiment was to add the Gaussian noise only to the input tensor of the discriminator. It was zero mean and its standard deviation ranges from 0.1 to 0 (i.e. decaying with each mini batch iteration). This has improved the result much better for the MNIST dataset.

17Skye17 commented 6 years ago

Thank you!I'll try it @ahmed-fau

phamnam95 commented 6 years ago

loss

Hello. I am training CycleGAN and my loss looks like that attached picture. The discriminator loss decreases but the generator loss fluctuates. I do not quite understand the reasons. Are there anyone have any suggestions? Thanks

robot010 commented 6 years ago

Adding noise to input seems to help. To be specific, i am implementing with tensorflow by adding: input = input + tf.random_normal(shape=tf.shape(input), mean=0.0, stddev=0.1, dtype=tf.float32)

bjgoncalves commented 6 years ago

I agree here that by adding noise to the discriminator loss model function, it does improves your generator loss to decrease. @ahmed-fau suggested very good tips.

lppier commented 6 years ago

Hi, I tried what you guys did, adding gaussian noise to the input of the discriminator. It does improve the graph, but however the test images generated by the generator comes out as noise as well. (previously I have relatively ok images, but my generator loss fn was going up).

Thoughts?

davesean commented 5 years ago

Hi, I tried what you guys did, adding gaussian noise to the input of the discriminator. It does improve the graph, but however the test images generated by the generator comes out as noise as well. (previously I have relatively ok images, but my generator loss fn was going up).

Thoughts?

Did you also have decay of the noise after a while?

hi0001234d commented 5 years ago

@EvgenyZamyatin adding noise to input helped, great thanks

aradhyamathur commented 5 years ago

I am facing a similar problem while using WGAN-GP. The generator initially produces good results but seems to diverge after some time and the discriminator loss suddenly dips and becomes very powerful making the generator output random noise. What can be done instead of label smoothing since I am using WGAN?

ljuvela commented 5 years ago

@aradhyamathur could try adding a penalty loss term for the discriminator output magnitude, similarly to https://github.com/tkarras/progressive_growing_of_gans

This helps to prevent a training dynamic where the models engage in a "magnitudes race" and eventually lose any meaningful learning signals.

LukasMosser commented 5 years ago

@phamnam95 That looks like typical cycleGAN loss. What is your batchsize? If it is one or 2, there will be lots of fluctuations in your objective function. Seen it before, looks pretty normal to me.

phamnam95 commented 5 years ago

@LukasMosser My batch size is 1. After adding some more constraints such as the identity loss, the self-distance loss, and I also semi-supervised cyclegan by using pair images, I can get generator decreases but not fast, instead it decreases very slowly and seems that after 200 epochs, the trend is still decreasing. And the discriminator decreases until it reaches a certain of epochs, it starts fluctuate. What do you think will be good? What batch size do you think is appropriate?

libo1712 commented 5 years ago

Hi, I tried what you guys did, adding gaussian noise to the input of the discriminator. It does improve the graph, but however the test images generated by the generator comes out as noise as well. (previously I have relatively ok images, but my generator loss fn was going up).

Thoughts?

Hi, I have the same problem? Can you solve it?great thanks

LukasMosser commented 5 years ago

@phamnam95 I think batch size = 1 is ok, I'm not really worried about the fluctuation it just means you'll have to pick one with appropriate generator loss and not one where it seemingly diverged.

ankittaxak5717 commented 5 years ago

Hello everyone actually I am working on Project where I am generating hard triplets using GAN. But I am using a food dataset which has 20 different class label and each class label has 200 images. My discriminator is predicting fake label in most of the cases even for real feature embedding. How to deal with this problem.

My model contains a Feature extractor, a generator, and a discriminator

Triplet dataset here means ( an anchor image, positive image, negative image) In my model I am passing this triplet dataset to feature_extractor and getting 3 embeddings Now these three embeddings are passed to the generator to get hard triplets from the generator using triplet loss. Then embeddings from feature_extractor are passed as real embedding to the discriminator. generator passes hard triplet embeddings to discriminator as a fake dataset.

My problem is that discriminator is predicting feature_extractor_real_embedding as well as generator_fake_embeddings as fake most of the time. I am working on this research paper (http://openaccess.thecvf.com/content_ECCV_2018/papers/Yiru_Zhao_A_Principled_Approach_ECCV_2018_paper.pdf) Can anyone suggest me how to deal with this problem? Please @LukasMosser @vijayvee

SuperBruceJia commented 5 years ago

Hello guys, I trained a GANs to generate more data for my EEG Motor Imagery classification tasks. Due to the unstable and noise-influenced reasons, how could I know the dataset GANs generated is belong to a certain category rather than others when I only use a certain class of data as the input of ground truth?

nsusvitlana commented 5 years ago

Did you try Batch Normalization for discriminator? It helped me out when I encountered the same problem.

gentlezr commented 5 years ago

Hi, add noise really works for me.

AlliedToasters commented 5 years ago

Hey all,

Just with playing with GANs obsessively for a few weeks now, I've started to notice two distinct collapse modes:

D overpowers G. G does not change (loss roughly static) while D slowly, steadily goes to 0.

In this case, adding dropout to any/all layers of D helps stabilize.

Another case, G overpowers D. It just feeds garbage to D and D does not discriminate. This one has been harder for me to solve! Adding noise G in ALL layers, with gradual annealing (lowering noise slightly each iteration) was the solution.

A third failure state, when G and D are roughly balanced but D is more consistent; occasional "spikes" come along associated with very high gradient norms. These come with dramatic updates to G; indicate to me to increase regularization on D so we get more frequent, less dramatic updates to G.

image

luisouza commented 5 years ago

Label switching has also helped for me.

Two updates of discriminator with real_label = 1, fake_label=0 and one update with real_label=0 and fake_label=1.

This is followed by one generator update with real_label = 1 and fake_label = 0.

Where did you guys implemented the label smoothing? After a number of iterations inside the training? Can you be more specific with that, please? I am trying to implement the label smoothing in my dcgan code, but I am not sure about where to implement it. Thank you.

luisouza commented 5 years ago

Hey all,

Just with playing with GANs obsessively for a few weeks now, I've started to notice two distinct collapse modes:

D overpowers G. G does not change (loss roughly static) while D slowly, steadily goes to 0.

In this case, adding dropout to any/all layers of D helps stabilize.

Another case, G overpowers D. It just feeds garbage to D and D does not discriminate. This one has been harder for me to solve! Adding noise G in ALL layers, with gradual annealing (lowering noise slightly each iteration) was the solution.

A third failure state, when G and D are roughly balanced but D is more consistent; occasional "spikes" come along associated with very high gradient norms. These come with dramatic updates to G; indicate to me to increase regularization on D so we get more frequent, less dramatic updates to G.

image

Hi, how did you implemented the second solution (adding noise G with gradual annealing)? Could you help me, please?

qizhou000 commented 3 years ago

Probably, the problem is that the discriminator overfit. One of the reasons leading to this is following thing: discriminator may "notice" that images from true distribution is a matrix of numbers of the form n/255. So, adding gaussian noise to the input images may help to avoid the problem. It helps in my case.

Thank you for your insight. I have never think about discriminator's overfit in this way.

ZainZhao commented 3 years ago

I think the discriminator got too strong relative to the generator. Beyond this point, the generator finds it almost impossible to fool the discriminator, hence the increase in it's loss. I'm facing a similar problem.

Thank u, I just use a 3-layers mlp as D, and the hidden layer size is very small. However, the capacity of D is also strong, what can i do ?

Pixie8888 commented 3 years ago

hi, I am also facing situation where discriminator loss goes to 0 (for both fake image and real image) and generator loss keeps increasing. Any idea how to solve it? I suspect whether it's because of discriminator is too strong, learning too fast? image image

Alexey322 commented 3 years ago

Hey all,

Just with playing with GANs obsessively for a few weeks now, I've started to notice two distinct collapse modes:

D overpowers G. G does not change (loss roughly static) while D slowly, steadily goes to 0.

In this case, adding dropout to any/all layers of D helps stabilize.

Another case, G overpowers D. It just feeds garbage to D and D does not discriminate. This one has been harder for me to solve! Adding noise G in ALL layers, with gradual annealing (lowering noise slightly each iteration) was the solution.

A third failure state, when G and D are roughly balanced but D is more consistent; occasional "spikes" come along associated with very high gradient norms. These come with dramatic updates to G; indicate to me to increase regularization on D so we get more frequent, less dramatic updates to G.

image

If gan loss is static and discriminator loss is go down, this means your generator can handle fake data even when the discriminators improve. Thus, if the generator loss does not change and the discriminator error falls, your model improves.

DISAPPEARED13 commented 1 year ago

Hey all,

Just with playing with GANs obsessively for a few weeks now, I've started to notice two distinct collapse modes:

D overpowers G. G does not change (loss roughly static) while D slowly, steadily goes to 0.

In this case, adding dropout to any/all layers of D helps stabilize.

Another case, G overpowers D. It just feeds garbage to D and D does not discriminate. This one has been harder for me to solve! Adding noise G in ALL layers, with gradual annealing (lowering noise slightly each iteration) was the solution.

A third failure state, when G and D are roughly balanced but D is more consistent; occasional "spikes" come along associated with very high gradient norms. These come with dramatic updates to G; indicate to me to increase regularization on D so we get more frequent, less dramatic updates to G.

image

Hi, there. Thanks for your post And thank for @Alexey322 's advice ! They are very useful!!! I am facing the third failure state? that is, even G and D loss goes balanced all the time, but still some peaks occurs when training. Is it means that still failed training? Could you please tell me how to apply more regularization on D? Thanks a lot! result