VAE part of model - Githubissues

maxhodak / keras-molecules

Autoencoder network for learning a continuous representation of molecular structures.

MIT License

519 stars 146 forks source link

VAE part of model #61

Open fgvbrt opened 7 years ago

fgvbrt commented 7 years ago

Hi, it looks like that this code actually train not VAE model but simple auto-encoder model. Here are reasons: 1) Epsilon std is 0.01 https://github.com/maxhodak/keras-molecules/blob/master/molecules/model.py#L58 when it should be 1. I assume that it is safe to say that there is almost no sampling. 2) KL loss should be very small because there is mean operation https://github.com/maxhodak/keras-molecules/blob/master/molecules/model.py#L78. In that case there is mean along feature and sequence shape. But both of this should be summed to obtain right KL loss relative to crossentropy loss. 3) The picture in readme also indicates that, because not all regions in latent space are covered by points. And authors wrote in paper that they observed this when they trained simple auto-encoder model.

May be it makes sense to simple train autoencoder model and compare results.

sbaurdlp commented 7 years ago

Hi,

I'm working on a similar problem but with protein sequences rather than molecules

You mention epsilon_std is not 1, which also seems quite strange to me Yet, I found it was often the case in other codes (for example the Keras tutorial on VAE) When I changed mine from 1.0 to 1e-3 some months ago, it allowed the model to learn (it didnt before)

Would you say VAE arent suited for that problem ?

Regards, Sebastien

larry0x commented 5 years ago

Hi Sebastien, do you have any update one the issue regarding epsilon_std?

I am trying to implement the same model in PyTorch and encountered the save problem. If I set epsilon_std to 1, the model refuses to learn anything (loss stagnates at very high values.)

If I change this value to 0, the VAE effectively degenerates to a simple AE. It learns very fast, recovering input sequences almost perfectly. But just like with any other simple AEs, the latent space it produces is sparse and it generates garbage when interpolating or decoding randomly sampled latent variables.

If I pick small, non-zero epsilon_std values, the result is between the two scenarios - the model learns better than when epsilon_std is set to 1 but not as good as when it is set to zero. In none of the cases the model works as good as described in Aspuru-Guzik's paper.

chaoyan1037 commented 5 years ago

@lyu18 I encountered the same problem as you. And I looked into the code of original paper and found out that they anneal the epsilon_std. Maybe this can help the model training. I will try it quickly.

larry0x commented 5 years ago

@allenallen1037 That makes a lot of sense! Please let me know if you get any results. Thanks

chaoyan1037 commented 5 years ago

@lyu18 It helps to improve the reconstruction accuracy when training. This is expected since it is some kind of tradeoff between AE and VAE. But the KL divergence loss is quite large, which means the latent space may not be smooth. I will do more investigation when finishing training.

maxime-langevin commented 5 years ago

@allenallen1037 I've encountered the same problem as you. Did you found a workaround that helped you solve it? Thanks