maxhodak / keras-molecules

Autoencoder network for learning a continuous representation of molecular structures.
MIT License
519 stars 146 forks source link

Issue replicating graph #71

Open hstone1 opened 7 years ago

hstone1 commented 7 years ago

I would like to be able to generate a picture akin to that displayed in the read me, however even though I converge my model beyond the point in the read me, I do not get the distinct striations shown. Rather I get a more spread out graph still with some striations

image_480

Has anyone been able to replicate the Image as displayed in the paper and readme. Was it generated using an actual a 2d latent dim, or a higher dimension then PCAed down to 2d (I have tried both and neither has worked), any help would be greatly appreciated.

osmelu commented 6 years ago

Same problem here. I've trained the model on the 500k ChEMBL data set using a 292 dimensions latent space and after 30 epochs I got a loss: 0.4956 and acc: 0.955. However, I'm far from that performance when using a 2D latent space (loss: 2.8712 - acc: 0.7075 after 30 epochs). This is how data looks in the 2D latent space:

from pylab import figure, axes, scatter, title, show x_latent = model.encoder.predict(data_train) figure(figsize=(6, 6)) scatter(x_latent[:, 0], x_latent[:, 1], marker='.') show()

image

And this is how it looks using the first two principal components from the 292 latent space:

from sklearn.decomposition import PCA from pylab import figure, axes, scatter, title, show x_latent = model.encoder.predict(data_train) pca = PCA(n_components = 2) x_latent_pca = pca.fit_transform(x_latent) figure(figsize=(6, 6)) scatter(x_latent_pca[:, 0], x_latent_pca[:, 1], marker='.', s=1) show()

image

@hstone1 Could you please explain how did you obtain the image showed above?

Any ideas how to reproduce the Figure displayed in the readme and in the paper?