soumith / ganhacks

starter from "How to Train a GAN?" at NIPS2016
11.45k stars 1.66k forks source link

Why Discriminator Loss 0 is a failure mode ? #36

Open arijitx opened 6 years ago

arijitx commented 6 years ago

In 10. you say Discriminator loss 0 is failure mode , but in the paper they say that, image

What I'm getting wrong here ?

Thanks,

ghost commented 6 years ago

I think the whole point of GANs is to have losses that counterbalance one another. We are not as in traditionnal CNNs in the presence of one loss we wish to reduce as much as possible. The error you show from the paper is indeed D's error. But you must consider also G's error, which is the oppossite of D (this is not exactly true and is also implementation-dependent, but it's the intuition that loss D = - loss G). Therefore, in GAN cases, you don't want D loss to go to zero because that would mean that D is doing a too good job (and most importantly, G a too bad one), ie it can easily discriminate between fake and real data (ie G's creations are not close enough to real data).

To sum it up, it's important to define loss of D that way because we do want D to try and reduce this loss but the ultimate goal of the whole G-D system is to have losses balance out. Hence if one loss goes to zero, it's failure mode (no more learning happens).

mateuszkaleta commented 6 years ago

Hence if one loss goes to zero, it's failure mode (no more learning happens).

I wouldn't say that no more learning happens. For instance: let's say that at the beginning, the discriminator's loss goes to 0. But then, the generator gets improved and in next iteration, the synthetic observations are good enough to fool the discriminator. So it's loss increases.

Generally, I would focus on the training process being stable. My understanding is that at the very beginning, the discriminator's accuracy should be high (say 90%), meaning that it separates fake observations from real ones well. Then, it's loss should steadily decrease as the generator improves.

The perfect (final) state is when you:

The last point however is another story.

ghost commented 6 years ago

@mateuszkaleta AFAICT if discriminator loss goes to zero, there are no more loss gradients flowing (since these gradients are derivatives of loss), so weights of D and G are not modified, so the G cannot "get improved in next iteration" as you propose.

sunbau commented 5 years ago

What should I do to prevent a failure mode? Does anyone have any suggestions? Thanks!

KrnTneja commented 5 years ago

When I trained a DCGAN on celebrity face dataset, my discriminator quickly converged to zero loss and no more learning happened. But I was able to solve this problem for my case.

The error was that I was using a sigmoid layer at the discriminator output and using binary cross entropy (BCE) loss on this output. Instead, when I didn't use the sigmoid layer and directly wrote BCE on logits, it worked like a charm.

This is a well-known problem of instability when dealing with exponentials and logarithms. Essentially, very high positive values for logits were approximated to 1 and very low negative values to 0, which doesn't happen when I directly use logits because it uses the log-sum-exp trick.

It's also my understanding that the loss can never really go to zero since logits can't be possibly -inf or +inf. So there must be some approximation when you're getting zero loss.

John1231983 commented 5 years ago

@KrnTneja : Thanks for your tricks. Could you provide any code to do it? I also meet the problem of loss D goes to zero

KrnTneja commented 5 years ago

@KrnTneja : Thanks for your tricks. Could you provide any code to do it? I also meet the problem of loss D goes to zero

There isn't really any code to show. Just ensure that last layer of your discriminator is not a sigmoid layer i.e. output shouldn't be constrained to [0,1]. I was using PyTorch, where I had to use torch.nn.BCEWithLogitsLoss instead of torch.nn.BCELoss.

arpita739 commented 4 years ago

hey, have you found any solution of this because I am having the same condition due to which I am not getting any generated image

moulicm111 commented 4 years ago

Discriminator loss is 0 means the discriminator easily finding the images by the generator. It may happen in some cases like generator leaving checkerboard effects.

moulicm111 commented 4 years ago

This may also occur total generator loss is sum two losses and generator is trying to minimize the other loss because weighing factor for it is more.

DISAPPEARED13 commented 3 years ago

I face the same problem when I am training a Cyclegan, with torch.sigmoid(D(fake_img)) and GANloss: BCELoss() and finally My G fell into mode failure... Now I try BCELossWithLogits() see what is going on. hope it will work and thank you @KrnTneja !

6xw commented 2 years ago

@KrnTneja : Thanks for your tricks. Could you provide any code to do it? I also meet the problem of loss D goes to zero

There isn't really any code to show. Just ensure that last layer of your discriminator is not a sigmoid layer i.e. output shouldn't be constrained to [0,1]. I was using PyTorch, where I had to use torch.nn.BCEWithLogitsLoss instead of torch.nn.BCELoss.

What's the difference between combining BCEWithLogitsLoss with logits outputs and combining BCELoss with sigmoid outputs???

Eliacus commented 2 years ago

@6xw When using BCEWithLogitsLoss, you can utilize the log-sum-exp trick to prevent overflow and thus increase numerical stability.

Pravin770 commented 2 years ago

@KrnTneja @mateuszkaleta

I have commented the Sigmoid layer in the discriminator and used BCEwithLogitsLoss and the Adam optimizer with a learning rate =0.0001. But still the discriminator loss reaches zero after 30 epochs. Is there anyway to fix that?

Raha304 commented 2 years ago

@KrnTneja @mateuszkaleta

I have commented the Sigmoid layer in the discriminator and used BCEwithLogitsLoss and the Adam optimizer with a learning rate =0.0001. But still the discriminator loss reaches zero after 30 epochs. Is there anyway to fix that?

Could you find any solution?