Open tantrev opened 7 years ago
What's the origin of these? Are these measured logPs based on looking up the SMILES? Or cLogP from rdkit? Something else?
Sorry, I should have specified. The LogP values were generated using ChemAxon's "generatemd" program from this repository's "smiles_50k.h5" file.
What would be really great is if you could alter the smiles_50k.h5
file to include a clogp
column with this data indexed to the right rows, instead of including this separately as a txt file. Then it would work with the --property_column
parameter on the preprocessing script and the rest of the tooling here.
I am curious about if it will be an issue to include data generated from ChemAxon. I mean any license issues.
LogP calculations for the 50k dataset. Same order as original SMILES file.