Closed petertulala closed 7 years ago
My fault, using cross-entropy is totally fine, good job. It's shown later in paper that H(c, Q(c | G(z,c))) is in fact the lower bound for H(c | G(z,c)) and latent code distribution is fixed, hence H(c) can be treated as a constant.
Reopening, I reviewed the Gaussian loss function, but I think it's not correct. This is how log-likelihood of Gaussian should be calculated:
And this is how the Gaussian loss is calculated in train.py
:
The log 2pi
may be skipped, since it is a constant, but I believe the rest is wrong. Correct implementation:
def gaussian_loss(y_true, y_pred):
mean = y_true[:, 0, :]
log_stdev = y_true[:, 1, :]
x = y_pred[:, 0, :]
frac = K.square(x - mean) / (K.exp(log_stdev) + K.epsilon())
return 0.5 * K.mean(np.log(2 * np.pi) + log_stdev + frac)
I tested the fixed equation and this is the result:
See also the implementation from OpenAI which calculates the loss the same way.
log(s^2) = 2 log(s). Hence the 0.5 factor goes away (1/2 * log(s^2) = log(s)). So I believe the current implementation is correct. It does ignore the constant factor as you noted.
Your new implementation is correct as well if you treat log_stdev as log(s**2).
Thank you for answer. However in in that case, shouldn't it be multiplied by 0.25
instead of 0.5
?
I think if we ignore the constant, the correct loss should be:
epsilon = (y_true - Q_C_mean) / (K.exp(Q_C_logstd) + K.epsilon())
loss_Q_C = (Q_C_logstd + 0.25 * K.square(epsilon))
We have:
log((2pi s^2)^(-0.5) * e^(-0.5(x-mu)^2 / s^2)) = cst + log(s^(-1)) - 0.5 (x - mu) ^ 2 / s^2 = cst - (log(s) + 0.5 ((x - mu) / e^(log(s))) ^ 2
Which is the expression in my code
Ok thanks, now I see both ways are correct, the difference is only in skipped constant. I made a mistake in my previous comment, it should have been just σ^2
instead of log σ^2
in denominator.
Why do you use categorical cross-entropy for loss function of categorical latent code in InfoGAN? The paper uses lower bound of mutual information (entropy minus conditional entropy), i.e.:
But cross-entropy H(c, Q(c | G(z,c))) between input and output latent codes instead of conditional entropy seems to me it's not correct.