Closed pengpaiSH closed 7 years ago
Have a look at the detailed implementation:
there are two passes to update D, one which maximises the score of real samples and then one which minimizes the score of negative samples.
For real samples y_true = -1, hence we minimize -mean(y_pred) or in other word maximize mean(y_pred). Conversely for fake samples.
All in all, this allows us to maximize (true_score - fake_score).
You could do that in a single pass, concatenating real and fake samples but I have found that this converges less well.
@tdeboissiere Thank you for your quick feedback! Let me summarize your comments and please correct if I am wrong. Our (from the point view of Discriminator) object is to maximize the distance between true_score - fake_score
. Let's define y = -1
for those real samples and y = +1
for the fake/generated samples. For the real ones, we have to max y_hat
in order to maximize true_score - fake_score
, i.e. min -y_hat
. Similarly, for those fake samples, we want to min y_hat
. We could re-write in a single equation: min y*y_hat
.
And one question is: why do you define y_true=-1
for real samples?
y_true does not have any "target" meaning.
Recall that all loss function for keras are minimized. For real samples our goal is to maximize true_score = K.mean(y_pred)
. To write the correct keras loss function, we then define the loss as K.mean(y_true * y_pred)
with y_true = -1
so that we minimize - K.mean(y_pred) <=> maximize K.mean(y_pred)
@tdeboissiere Thank you for you explanation! That is really really helpful!
In the codes, a loss objective named with Wasserstein is defined as follows.
After reading the original paper, we know WGAN wants to minimize the Wasserstein Distance: the difference of expectation of generated samples and expectation of real samples. And why
K.mean(y_true * y_pred)
could implement such distance?