KL divergence term in loss function

The loss function for the autoencoder is calculated here using kl_loss = - 0.5 * K.mean(1 + z_log_var - K.square(z_mean) - K.exp(z_log_var), axis = -1) taking the mean over the dimensions of the latent representation. However, several other sources, including the VAE example in the keras repo, use the sum instead: kl_loss = - 0.5 * K.sum(1 + z_log_var - K.square(z_mean) - K.exp(z_log_var), axis=-1) Is there a reason for the difference? Given the relatively large number of latent dimensions, it seems like this would significantly impact the strength of the KL regularization.

maxhodak / keras-molecules

KL divergence term in loss function #59