mkusner / grammarVAE

Code for the "Grammar Variational Autoencoder" https://arxiv.org/abs/1703.01925
269 stars 78 forks source link

how to sample from the generative model #22

Closed fdamani closed 3 years ago

fdamani commented 5 years ago

Hello,

I'd to sample a batch of molecules from a pretrained GrammarVAE. Using encode_decode_zinc.py as inspiration, I first loaded the grammar_weights and grammar_model. I then sample from a standard Normal and then call the decode function on the grammar model using a sample.

grammar_weights = "pretrained/zinc_vae_grammar_L56_E100_val.hdf5"
grammar_model = molecule_vae.ZincGrammarModel(grammar_weights)
latent_rep_size = 56
epsilon_std = 1.0
batch_size = 1000
prior_z_samples = np.random.normal(loc=0.0, scale=epsilon_std, size=(batch_size, latent_rep_size))
decoded_samples = []
for i in range(batch_size):
    decoded_samples.append(grammar_model.decode(prior_z_sample[i][None,:])[0])

Does this seem like a reasonable way to get samples from the generative model? The code above runs but the outputted molecules seem off. For example, if I plot the empirical distribution of QED scores using the sampled molecules to the empirical distribution of QED scores from the zinc dataset, the empirical distribution from GrammarVAE is highly overdispersed and on average has a lower QED score.

mkusner commented 4 years ago

Sorry for my delay on this.

In practice the encoder posterior will not perfectly match the prior N(0,1). What I'd do is pass the zinc dataset through the encoder, record the mean and standard deviation, then randomly sample from that distribution to get better samples from the model.