maxhodak / keras-molecules

Autoencoder network for learning a continuous representation of molecular structures.
MIT License
520 stars 146 forks source link

Sharing trained weight #10

Closed jeammimi closed 7 years ago

jeammimi commented 8 years ago

Hello, I am very interested in testing your model. Do you think you could share the trained model file (model.h5 file) Thank you

maxhodak commented 7 years ago

I need to train a new model given recent changes to model.py but I'll include a trained model (via git-lfs, ~100mb) once it's done.

jeammimi commented 7 years ago

Ok, thanks. I tried to train the model on some molecules extracted from a patent database, and my accuracy is kind of low: 0.72 There is about 1millions of molecules. What kind of accuracy do you get?

Le 6 nov. 2016 3:03 PM, "Max Hodak" notifications@github.com a écrit :

I need to train a new model given recent changes to model.py but I'll include a trained model (via git-lfs, ~100mb) once it's done.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/maxhodak/keras-molecules/issues/10#issuecomment-258682813, or mute the thread https://github.com/notifications/unsubscribe-auth/AGQ4arw6cidOVUVMSvUcNz61TVXDwyV0ks5q7d5HgaJpZM4KkyKH .

maxhodak commented 7 years ago

The pretrained model is now included in data/model_500k.h5. It works with data/smiles_500k.h5 (you must feed that data file through the preprocessing script first).

I was seeing an accuracy of about 0.98 when I stopped training that model (60 epochs, ~12 hours on a GTX 1080).

michaelosthege commented 7 years ago

could someone post the charset that the model was trained for? I'm having difficulties to reproduce (mostly due to Python version..). In my re-implementation I'm sorting the charset and save it to a .json file alongside the trained model.