Confusion about your Wasserstein loss function vs one described in paper

tdeboissiere / DeepLearningImplementations

Implementation of recent Deep Learning papers

MIT License

1.81k stars 650 forks source link

The approximate Wasserstein loss you define is:

def wasserstein(y_true, y_pred):
    return K.mean(y_true * y_pred)

But in the paper (Algorithm 1) they optimize: $\frac{1}{m}\sum^{m}_{i=1}f_w(x^{(i)})&space;-&space;\frac{1}{m}\sum^{m}_{i=1}f_w(g_\theta(x^{(i)}))$ &space;-&space;\frac{1}{m}\sum^{m}_{i=1}fw(g\theta(x^{(i)}))) Where x is the batch of real images, z is the noise input to the generator g, and f is the critic.

Your objective seems to function just fine, since the network clearly learns well. I'm confused where the multiplication is coming from though. Could you explain that part of your code?

tdeboissiere / DeepLearningImplementations

Confusion about your Wasserstein loss function vs one described in paper #39