tdeboissiere / DeepLearningImplementations

Implementation of recent Deep Learning papers
MIT License
1.81k stars 650 forks source link

Your loss function seems problematic #35

Closed kunrenzhilu closed 7 years ago

kunrenzhilu commented 7 years ago

It seems that you minimize the disc_loss_real first and then minimize the disc_loss_gen, however, according to the original paper, you should do the optimization of both simultaneously. Since this allow the optimizer to find the gradient of whole function in a global space, instead of following either the direction of the loss_real or loss_gen.

tdeboissiere commented 7 years ago

Which loss function (which GAN) are you referring to ?

kunrenzhilu commented 7 years ago

Oops, sorry I thought the issues is under WassersteinGAN already.

tdeboissiere commented 7 years ago

As far as I can tell, the authors are implementing the optimization for the discriminator in 2 steps as well. I have tried both and got much better convergence with the 2 steps approach.

kunrenzhilu commented 7 years ago

Sorry I see the reason. Since when updating the netD, the loss function is a linear combination of disc_loss_real and disc_loss_gen, and therefore the gradient can be computed separately. However, if you look at the @martinarjovsky 's code, you will find (line 199), the optimisation is updated in a global scale after combining two parts. However, in your code, you are really updating the weights separately, right? Any comment on this? As you said that it got much better convergence, I think it's a good observation and might provide some space for research? :) Btw, just curious, why you would put yreal * ypred in the models.wasserstein()? It's not straight forward for me and take me for some contemplation to get it...

Thanks!

tdeboissiere commented 7 years ago

Fair point, my explanation was off.

I think the issue here is Keras' API.

Stacking X_real and X_gen together does not match the original algorithm and perturbs BatchNormalization (see Soumith GAN tricks). So I separated the updates to the discriminator in two distinct parts and it trained OK.

To correctly mimic the original code, I would have had to use more involved Keras hacking which I'd rather avoid.

jiangzidong commented 7 years ago

@tdeboissiere Can you please elaborate more about Stacking X_real and X_gen together does not match the original algorithm and perturbs BatchNormarlization (see Soumith GAN tricks).?

Thanks a lot