The loss function for the autoencoder is calculated here using
kl_loss = - 0.5 * K.mean(1 + z_log_var - K.square(z_mean) - K.exp(z_log_var), axis = -1)
taking the mean over the dimensions of the latent representation. However, several other sources, including the VAE example in the keras repo, use the sum instead:
kl_loss = - 0.5 * K.sum(1 + z_log_var - K.square(z_mean) - K.exp(z_log_var), axis=-1)
Is there a reason for the difference? Given the relatively large number of latent dimensions, it seems like this would significantly impact the strength of the KL regularization.
The loss function for the autoencoder is calculated here using
kl_loss = - 0.5 * K.mean(1 + z_log_var - K.square(z_mean) - K.exp(z_log_var), axis = -1)
taking the mean over the dimensions of the latent representation. However, several other sources, including the VAE example in the keras repo, use the sum instead:kl_loss = - 0.5 * K.sum(1 + z_log_var - K.square(z_mean) - K.exp(z_log_var), axis=-1)
Is there a reason for the difference? Given the relatively large number of latent dimensions, it seems like this would significantly impact the strength of the KL regularization.