tdeboissiere / DeepLearningImplementations

Implementation of recent Deep Learning papers
MIT License
1.81k stars 650 forks source link

Confusion about your Wasserstein loss function vs one described in paper #39

Closed kaijfox closed 7 years ago

kaijfox commented 7 years ago

The approximate Wasserstein loss you define is:

def wasserstein(y_true, y_pred):
    return K.mean(y_true * y_pred)

But in the paper (Algorithm 1) they optimize: \frac{1}{m}\sum^{m}_{i=1}f_w(x^{(i)})&space;-&space;\frac{1}{m}\sum^{m}_{i=1}f_w(g_\theta(x^{(i)}))&space;-&space;\frac{1}{m}\sum^{m}_{i=1}fw(g\theta(x^{(i)}))) Where x is the batch of real images, z is the noise input to the generator g, and f is the critic.

Your objective seems to function just fine, since the network clearly learns well. I'm confused where the multiplication is coming from though. Could you explain that part of your code?

tdeboissiere commented 7 years ago

That's exactly what I do. In this case, y_true = 1 or -1. So K.mean(y_true * y_pred) is the same as the first part of the equation.

I give more details on the way I optimize this GAN here: https://github.com/tdeboissiere/DeepLearningImplementations/tree/master/WassersteinGAN/src/model