maxhodak / keras-molecules

Autoencoder network for learning a continuous representation of molecular structures.
MIT License
519 stars 146 forks source link

LogP calculations for 50k dataset #13

Open tantrev opened 7 years ago

tantrev commented 7 years ago

LogP calculations for the 50k dataset. Same order as original SMILES file.

maxhodak commented 7 years ago

What's the origin of these? Are these measured logPs based on looking up the SMILES? Or cLogP from rdkit? Something else?

tantrev commented 7 years ago

Sorry, I should have specified. The LogP values were generated using ChemAxon's "generatemd" program from this repository's "smiles_50k.h5" file.

maxhodak commented 7 years ago

What would be really great is if you could alter the smiles_50k.h5 file to include a clogp column with this data indexed to the right rows, instead of including this separately as a txt file. Then it would work with the --property_column parameter on the preprocessing script and the rest of the tooling here.

hsiaoyi0504 commented 7 years ago

I am curious about if it will be an issue to include data generated from ChemAxon. I mean any license issues.