maxhodak / keras-molecules

Autoencoder network for learning a continuous representation of molecular structures.
MIT License
519 stars 146 forks source link

Updated pretrained model? #45

Closed liamnaka closed 5 years ago

liamnaka commented 7 years ago

First of all I want to thank you all for your immense contributions to this library, it is truly helpful.

As someone without access to a powerful GPU, training these models is very time consuming. I encountered an error when testing out the model_500k.h5 on smiles_50k.h5 with sample_gen.py

I inputed python sample_gen.py smiles_50k.h5 data/model_500k.h5 --target autoencoder and received the error ValueError: Shapes (9, 1, 56, 9) and (9, 1, 55, 9) are not compatible from within the load_weights_from_hdf5_group function.

Perhaps an updated pretrained model is needed? The models I train compile, but are very inaccurate (because of my machine's limitations), so something tells me it has to do with the provided model. I might be able to get an AWS server running to help out if needed.

Regards, Liam

pechersky commented 7 years ago

I think that error comes from the fact that the sample_gen.py assumes a fixed charset that is not equivalent to the one in the pretrained model. I'll take a look at updating the pretrained model. In the meantime, you can try the older sample.py on the pretrained model.

Kfir-Schreiber commented 6 years ago

Hi,

Would like to join @liamnaks thanks. This project is truly helpful.

I get the same error when trying to use sample.py with the pretrained 500k model and the ChEMBL dataset. It seems like the saved weights are from a model that was trained with a charset of size 55 and the charset for ChEMBL has 56 characters.

Would it be possible to upload the original charset that was used to train the model?

Thanks, Kfir