openforcefield / openff-benchmark

Comparison benchmarks between public force fields and Open Force Field Initiative force fields
MIT License
11 stars 2 forks source link

Implicit hydrogen molecules #78

Open jthorton opened 3 years ago

jthorton commented 3 years ago

Some molecules from the benchmark partner sets contain implicit hydrogens in the SDF. These should have been filtered out in the validation stage but due to a bug in rdkit/the toolkit these molecules have been included. This causes issues as the cmiles does not have a numeric tag on the implicit hydrogens which causes them to be dropped when remaking the molecule. I have searched through the public dataset and tried to load each molecule with openeye from the cmiles which raises an error, the ids of the entries with this issue are attached and correspond to two partner ids GNT and MRK which may indicate their internal molecules may also suffer from this issue. unique.json.zip cc @j-wags @dotsdl