Closed clockwiser closed 7 years ago
Hi, I have the same question as you. Do you get the answer?
They are different, because each get through different layer of same level of the neural network. By optimization, z_mean will be close to mean, and z_log_var be close to log std. Here I had another question: why use log std, not std? Because log std is more numerically stable, see the link below.
https://wiseodd.github.io/techblog/2016/12/10/variational-autoencoder/
Hi, thanks very much for the reply and the link. But can I ask you another question: for the Custom loss layer, there're slight differences between your link and https://github.com/fchollet/keras/blob/master/examples/variational_autoencoder_deconv.py for this: xent_loss = original_dim * metrics.binary_crossentropy(x, x_decoded_mean), why original_dim is multiplied here?
And, for the link you provided, why there are just sums , no means, shouldn't 'recon = K.sum(K.binary_crossentropy(y_pred, y_true), axis=1)' be sum and then mean?
I will appreciate it very much if you could reply.
See the source of metrics.binary_crossentropy. It returns mean of the cross entropy.
def binary_crossentropy(y_true, y_pred): return K.mean(K.binary_crossentropy(y_true, y_pred), axis=-1)
So to get the sum, multiplying original_dim is reasonable. But again, there are many ways of handling losses. You can select the best one as much as it works.
As described in the "Variational Autoencoder: Intuition and Implementation": ... As we might already know, maximizing E[logP(X|z)] is a maximum likelihood estimation. We basically see it all the time in discriminative supervised model, for example Logistic Regression, SVM, or Linear Regression. In the other words, given an input z and an output X, we want to maximize the conditional distribution P(X|z) under some model parameters. So we could implement it by using any classifier with input z and output X, then optimize the objective function by using for example log loss or regression loss. ...
z_mean and z_log_var look like same. ..... x = Input(batch_shape=(batch_size, original_dim)) h = Dense(intermediate_dim, activation='relu')(x) z_mean = Dense(latent_dim)(h) z_log_var = Dense(latent_dim)(h) ..... How is it possible?