Open arijitx opened 6 years ago
I think the whole point of GANs is to have losses that counterbalance one another. We are not as in traditionnal CNNs in the presence of one loss we wish to reduce as much as possible. The error you show from the paper is indeed D's error. But you must consider also G's error, which is the oppossite of D (this is not exactly true and is also implementation-dependent, but it's the intuition that loss D = - loss G). Therefore, in GAN cases, you don't want D loss to go to zero because that would mean that D is doing a too good job (and most importantly, G a too bad one), ie it can easily discriminate between fake and real data (ie G's creations are not close enough to real data).
To sum it up, it's important to define loss of D that way because we do want D to try and reduce this loss but the ultimate goal of the whole G-D system is to have losses balance out. Hence if one loss goes to zero, it's failure mode (no more learning happens).
Hence if one loss goes to zero, it's failure mode (no more learning happens).
I wouldn't say that no more learning happens. For instance: let's say that at the beginning, the discriminator's loss goes to 0. But then, the generator gets improved and in next iteration, the synthetic observations are good enough to fool the discriminator. So it's loss increases.
Generally, I would focus on the training process being stable. My understanding is that at the very beginning, the discriminator's accuracy should be high (say 90%), meaning that it separates fake observations from real ones well. Then, it's loss should steadily decrease as the generator improves.
The perfect (final) state is when you:
The last point however is another story.
@mateuszkaleta AFAICT if discriminator loss goes to zero, there are no more loss gradients flowing (since these gradients are derivatives of loss), so weights of D and G are not modified, so the G cannot "get improved in next iteration" as you propose.
What should I do to prevent a failure mode? Does anyone have any suggestions? Thanks!
When I trained a DCGAN on celebrity face dataset, my discriminator quickly converged to zero loss and no more learning happened. But I was able to solve this problem for my case.
The error was that I was using a sigmoid layer at the discriminator output and using binary cross entropy (BCE) loss on this output. Instead, when I didn't use the sigmoid layer and directly wrote BCE on logits, it worked like a charm.
This is a well-known problem of instability when dealing with exponentials and logarithms. Essentially, very high positive values for logits were approximated to 1 and very low negative values to 0, which doesn't happen when I directly use logits because it uses the log-sum-exp trick.
It's also my understanding that the loss can never really go to zero since logits can't be possibly -inf or +inf. So there must be some approximation when you're getting zero loss.
@KrnTneja : Thanks for your tricks. Could you provide any code to do it? I also meet the problem of loss D goes to zero
@KrnTneja : Thanks for your tricks. Could you provide any code to do it? I also meet the problem of loss D goes to zero
There isn't really any code to show. Just ensure that last layer of your discriminator is not a sigmoid layer i.e. output shouldn't be constrained to [0,1]
. I was using PyTorch, where I had to use torch.nn.BCEWithLogitsLoss
instead of torch.nn.BCELoss
.
hey, have you found any solution of this because I am having the same condition due to which I am not getting any generated image
Discriminator loss is 0 means the discriminator easily finding the images by the generator. It may happen in some cases like generator leaving checkerboard effects.
This may also occur total generator loss is sum two losses and generator is trying to minimize the other loss because weighing factor for it is more.
I face the same problem when I am training a Cyclegan, with torch.sigmoid(D(fake_img)) and GANloss: BCELoss() and finally My G fell into mode failure... Now I try BCELossWithLogits() see what is going on. hope it will work and thank you @KrnTneja !
@KrnTneja : Thanks for your tricks. Could you provide any code to do it? I also meet the problem of loss D goes to zero
There isn't really any code to show. Just ensure that last layer of your discriminator is not a sigmoid layer i.e. output shouldn't be constrained to
[0,1]
. I was using PyTorch, where I had to usetorch.nn.BCEWithLogitsLoss
instead oftorch.nn.BCELoss
.
What's the difference between combining BCEWithLogitsLoss with logits outputs and combining BCELoss with sigmoid outputs???
@6xw When using BCEWithLogitsLoss, you can utilize the log-sum-exp trick to prevent overflow and thus increase numerical stability.
@KrnTneja @mateuszkaleta
I have commented the Sigmoid layer in the discriminator and used BCEwithLogitsLoss and the Adam optimizer with a learning rate =0.0001. But still the discriminator loss reaches zero after 30 epochs. Is there anyway to fix that?
@KrnTneja @mateuszkaleta
I have commented the Sigmoid layer in the discriminator and used BCEwithLogitsLoss and the Adam optimizer with a learning rate =0.0001. But still the discriminator loss reaches zero after 30 epochs. Is there anyway to fix that?
Could you find any solution?
In 10. you say Discriminator loss 0 is failure mode , but in the paper they say that,
What I'm getting wrong here ?
Thanks,