They were able to implement a VAE that autoencodes the SMILES grammar as a context-free grammar, which is a pretty good approximation. They improved over the GB SMILES VAE. This would solve several of the issues that have been filed such as #31 #54.
The Grammar VAE code is at https://github.com/mkusner/grammarVAE. It looks a lot like the VAE code we already have, in terms of the actual model code. Incorporating the zinc_grammar and the masking shouldn't be too difficult -- just some work to implement the masking in Theano.
In fact, their code also uses mean and not sum in the KL loss term, considering #59.
A new paper came out: https://arxiv.org/abs/1703.01925
They were able to implement a VAE that autoencodes the SMILES grammar as a context-free grammar, which is a pretty good approximation. They improved over the GB SMILES VAE. This would solve several of the issues that have been filed such as #31 #54.
The Grammar VAE code is at https://github.com/mkusner/grammarVAE. It looks a lot like the VAE code we already have, in terms of the actual model code. Incorporating the
zinc_grammar
and the masking shouldn't be too difficult -- just some work to implement the masking in Theano.In fact, their code also uses
mean
and notsum
in the KL loss term, considering #59.