maxhodak / keras-molecules

Autoencoder network for learning a continuous representation of molecular structures.
MIT License
519 stars 146 forks source link

KL divergence term in loss function #59

Open hutchisonc opened 7 years ago

hutchisonc commented 7 years ago

The loss function for the autoencoder is calculated here using kl_loss = - 0.5 * K.mean(1 + z_log_var - K.square(z_mean) - K.exp(z_log_var), axis = -1) taking the mean over the dimensions of the latent representation. However, several other sources, including the VAE example in the keras repo, use the sum instead: kl_loss = - 0.5 * K.sum(1 + z_log_var - K.square(z_mean) - K.exp(z_log_var), axis=-1) Is there a reason for the difference? Given the relatively large number of latent dimensions, it seems like this would significantly impact the strength of the KL regularization.