cost function question - Githubissues

tdeboissiere / DeepLearningImplementations

Implementation of recent Deep Learning papers

MIT License

1.81k stars 650 forks source link

cost function question #47

Closed petertulala closed 7 years ago

petertulala commented 7 years ago

Why do you use categorical cross-entropy for loss function of categorical latent code in InfoGAN? The paper uses lower bound of mutual information (entropy minus conditional entropy), i.e.:

I(c, G(z,c)) = H(c) - H(c | G(z,c))

But cross-entropy H(c, Q(c | G(z,c))) between input and output latent codes instead of conditional entropy seems to me it's not correct.

petertulala commented 7 years ago

My fault, using cross-entropy is totally fine, good job. It's shown later in paper that H(c, Q(c | G(z,c))) is in fact the lower bound for H(c | G(z,c)) and latent code distribution is fixed, hence H(c) can be treated as a constant.

petertulala commented 7 years ago

Reopening, I reviewed the Gaussian loss function, but I think it's not correct. This is how log-likelihood of Gaussian should be calculated: screen shot 2017-09-24 at 02 08 13

And this is how the Gaussian loss is calculated in train.py: screen shot 2017-09-24 at 02 13 26

The log 2pi may be skipped, since it is a constant, but I believe the rest is wrong. Correct implementation:

def gaussian_loss(y_true, y_pred):
    mean = y_true[:, 0, :]
    log_stdev = y_true[:, 1, :]
    x = y_pred[:, 0, :]

    frac = K.square(x - mean) / (K.exp(log_stdev) + K.epsilon())
    return 0.5 * K.mean(np.log(2 * np.pi) + log_stdev + frac)

I tested the fixed equation and this is the result: download 6

See also the implementation from OpenAI which calculates the loss the same way.

tdeboissiere commented 7 years ago

log(s^2) = 2 log(s). Hence the 0.5 factor goes away (1/2 * log(s^2) = log(s)). So I believe the current implementation is correct. It does ignore the constant factor as you noted.

Your new implementation is correct as well if you treat log_stdev as log(s**2).

petertulala commented 7 years ago

Thank you for answer. However in in that case, shouldn't it be multiplied by 0.25 instead of 0.5?

I think if we ignore the constant, the correct loss should be:

epsilon = (y_true - Q_C_mean) / (K.exp(Q_C_logstd) + K.epsilon())
loss_Q_C = (Q_C_logstd + 0.25 * K.square(epsilon))

tdeboissiere commented 7 years ago

We have:

log((2pi s^2)^(-0.5) * e^(-0.5(x-mu)^2 / s^2)) = cst + log(s^(-1)) - 0.5 (x - mu) ^ 2 / s^2 = cst - (log(s) + 0.5 ((x - mu) / e^(log(s))) ^ 2

Which is the expression in my code

petertulala commented 7 years ago

Ok thanks, now I see both ways are correct, the difference is only in skipped constant. I made a mistake in my previous comment, it should have been just σ^2 instead of log σ^2 in denominator.