openforcefield / openff-benchmark

Comparison benchmarks between public force fields and Open Force Field Initiative force fields
MIT License
11 stars 2 forks source link

Change validation RMS deduplication cutoff to 0.2 A, gen-confs to 0.5 A #56

Closed j-wags closed 3 years ago

j-wags commented 3 years ago

This choice was made to ensure that cis-amide conformations and saturated ring puckers are sampled in simple molecules. Details of testing different RMSD cutoffs for conformer generation are here: https://docs.google.com/spreadsheets/d/1jjo3LhIN7RG55R2Fea7mnRLr96Rw6-9xaTdH_xliuGg/edit#gid=0

Full message:

We decided to: 1) reduce the generate-conformers step's heavy-atom-RMSD cutoff down to a minimum of 0.5 A, and 2) set the validate step's conformer deduplication RMSD cutoff to 0.2 A

With these new settings, only 158 molecules of the burn-in set result in one conformer (down from 223 in Bill's original message). Around 80% of these 158 one-conformer molecules lack rotatable heavy atom torsions altogether. With change 1), RDKit's conformer generator will

  • sometimes produce multiple puckers for saturated rings (two puckers for burn-in molecules 89, 90, 95, 211 and 214, but still only one for 12, 85, 86, 124, 125, and 292)
  • produce cis- and trans- conformers for most amide bonds (cis- and trans- for burn-in molecules 55, 106, 120, 168, 206, 227)

However, I was wrong in my previous message, and RDKit's EmbedMultipleConfs does not produce trans-ester conformers. That's why we are doing 2). This will keep the validation step highly permissive to low-RMSD conformers of the same molecule. So while our current conformer generation method is fails to produce trans-esters, this provides a path forward where multiple conformers of an input molecules can be supplied by the user, even if those conformers have a lower RMSD than the conformer generator would otherwise allow. So if you have a more robust conformer generation method or conformer database, it will be possible to feed in multiple ring puckers and trans-ester conformers directly.

Thanks for spotting this -- I'm optimistic that these changes will improve the quality of data in this study.

We're still discussing how this change could affect the analysis -- Basically we'll need to see what happens if many of these low-RMSD conformers are optimized by Psi4 into the same minimum. However, changes to the analysis code (if needed) can be made after the QM jobs have run, so this isn't a blocking issue at this stage.