Confusion with Definition of Wasserstein

tdeboissiere / DeepLearningImplementations

Implementation of recent Deep Learning papers

MIT License

1.81k stars 650 forks source link

Confusion with Definition of Wasserstein #17

Closed pengpaiSH closed 7 years ago

pengpaiSH commented 7 years ago

In the codes, a loss objective named with Wasserstein is defined as follows.

    return K.mean(y_true * y_pred)

After reading the original paper, we know WGAN wants to minimize the Wasserstein Distance: the difference of expectation of generated samples and expectation of real samples. And why K.mean(y_true * y_pred) could implement such distance?

tdeboissiere commented 7 years ago

Have a look at the detailed implementation:

there are two passes to update D, one which maximises the score of real samples and then one which minimizes the score of negative samples.

For real samples y_true = -1, hence we minimize -mean(y_pred) or in other word maximize mean(y_pred). Conversely for fake samples.

All in all, this allows us to maximize (true_score - fake_score).

You could do that in a single pass, concatenating real and fake samples but I have found that this converges less well.

pengpaiSH commented 7 years ago

@tdeboissiere Thank you for your quick feedback! Let me summarize your comments and please correct if I am wrong. Our (from the point view of Discriminator) object is to maximize the distance between true_score - fake_score. Let's define y = -1 for those real samples and y = +1 for the fake/generated samples. For the real ones, we have to max y_hat in order to maximize true_score - fake_score, i.e. min -y_hat. Similarly, for those fake samples, we want to min y_hat. We could re-write in a single equation: min y*y_hat.

And one question is: why do you define y_true=-1 for real samples?

tdeboissiere commented 7 years ago

y_true does not have any "target" meaning.

Recall that all loss function for keras are minimized. For real samples our goal is to maximize true_score = K.mean(y_pred). To write the correct keras loss function, we then define the loss as K.mean(y_true * y_pred) with y_true = -1 so that we minimize - K.mean(y_pred) <=> maximize K.mean(y_pred)

pengpaiSH commented 7 years ago

@tdeboissiere Thank you for you explanation! That is really really helpful!