openforcefield / openff-benchmark

Comparison benchmarks between public force fields and Open Force Field Initiative force fields
MIT License
11 stars 2 forks source link

[WIP] Increase inter-conformer cutoff #45

Closed j-wags closed 3 years ago

j-wags commented 3 years ago
dfhahn commented 3 years ago

I just tested the new threshold. For my dataset comprising 259 molecules and 1053 conformers, validate reduces the set to 267 molecules and 349 conformers. Of the 704 conformers, which were removed to error_mols, 10 failed (unspecified stereochemistry) and the other 694 were removed because they are duplicates. That's a big change to before where all of the latter conformers were included. generate-conformers reduces the number of molecules to 265 (errors not specified) and increases the conformers to 1065 conformers. 79 molecules have only one conformer, the other 186 molecules have at least two conformers. After looking at couple of molecules with only 1 conformer (rigid aromatic molecules), I agree to the new thresholds.

j-wags commented 3 years ago

For my dataset comprising 259 molecules and 1053 conformers, validate reduces the set to 267 molecules and 349 conformers

Hm, quick sanity check. Are the numbers 259 and 267 switched here?

That's a dramatic reduction in number of conformers, though I could imagine that PDB ligands would be biased toward the same conformers. Again as a sanity check, do the "deduplicated" conformers seem reasonable upon visual inspection?

j-wags commented 3 years ago

The error_mols/error_mol_X.txt should contain the reasoning behind considering a molecule to be an error, and reference which molecule it's a duplicate of (to help with visual inspection of "redundant" conformers)

dfhahn commented 3 years ago

Hm, quick sanity check. Are the numbers 259 and 267 switched here?

No, this is correct. Some of the 259 original molecules were split up into different molecular entities. I guess due to different enantiomers or bond orders.

dfhahn commented 3 years ago

That's a dramatic reduction in number of conformers, though I could imagine that PDB ligands would be biased toward the same conformers. Again as a sanity check, do the "deduplicated" conformers seem reasonable upon visual inspection?

It's reasonable, but not really obvious. Attached is one example of a set of conformers. The orange and the silver conformer are part of the set, the other four are error mols. JAN-00000

Here are the SDFs JAN-00000-00.zip

j-wags commented 3 years ago

Some of the 259 original molecules were split up into different molecular entities. I guess due to different enantiomers or bond orders.

That makes sense. Probably stereoisomers of pyrimidal nitrogens if I had to guess. It'll be a nice day when we can remove this behavior from the toolkit.

Thanks for trying this out, @dfhahn. Merging!