openforcefield / openff-toolkit

The Open Forcefield Toolkit provides implementations of the SMIRNOFF format, parameterization engine, and other tools. Documentation available at http://open-forcefield-toolkit.readthedocs.io
http://openforcefield.org
MIT License
302 stars 88 forks source link

Undefined stereo error depends on the toolkit. #725

Open jthorton opened 3 years ago

jthorton commented 3 years ago

Describe the bug Undefined stereo errors seem to depend on the backend toolkit used to create the molecule see the example below I think this is mainly due to nitrogen stereochemistry being picked up or not.

This also causes an error in QCSubmit when enumerating stereochemistry, in the workflow we have a component which enumerates all undefined stereochemistry and checks that the molecule is valid before adding it to a dataset by doing a round trip and looking for undefined stereo errors. In the case of this molecule, the nitrogen stereo chem is enumerated and 2 isomers are made but no new molecules are returned from the method as the toolkit makes sure to not return the same molecule and as nitrogen stereo chem is no longer checked in the isomorphism check the toolkit thinks the two new molecules are the same as the input and they are removed. Then when we do the round trip test on the input molecule it fails as openeye identifies missing stereochemistry meaning any input molecules with only undefined nitrogen stereochemistry are accidentally removed when building datasets.

To Reproduce

mol = Molecule.from_smiles("[H]c1c(c(c(c(c1C#N)[H])[H])N(C([H])([H])c2c(c(c(c(c2[H])Br)OS(=O)(=O)N([H])[H])[H])[H])N3C(=NN=C3[H])[H])[H]", allow_undefined_stereo=False, toolkit_registry=OpenEyeToolkitWrapper())

Problematic atoms are: Atom atomic num: 7, name: , idx: 11, aromatic: False, chiral: True with bonds: bond order: 1, chiral: False to atom atomic num: 6, name: , idx: 3, aromatic: True, chiral: False bond order: 1, chiral: False to atom atomic num: 6, name: , idx: 12, aromatic: False, chiral: False bond order: 1, chiral: False to atom atomic num: 7, name: , idx: 32, aromatic: False, chiral: False

mol = Molecule.from_smiles("[H]c1c(c(c(c(c1C#N)[H])[H])N(C([H])([H])c2c(c(c(c(c2[H])Br)OS(=O)(=O)N([H])[H])[H])[H])N3C(=NN=C3[H])[H])[H]", allow_undefined_stereo=False, toolkit_registry=RDKitToolkitWrapper())

Molecule with name '' and SMILES '[H]c1c(c(c(c(c1C#N)[H])[H])N(C([H])([H])c2c(c(c(c(c2[H])Br)OS(=O)(=O)N([H])[H])[H])[H])N3C(=NN=C3[H])[H])[H]'

j-wags commented 3 years ago

Linking to the uber-issue on this at #146.

I think we'll ultimately want to have our own "OFF stereochemistry" model, where we have SMARTS patterns for the atoms/bonds that WE consider stereogenic, and then add some complexity to the OE/RDKit interfaces to enforce this definition of sterechemistry (where hopefully our stereogenic groups are a subset of their stereogenic groups, so it'll be an exercise in selectively ignoring some stereo in those cheminformatics toolkits)