openforcefield / openff-toolkit

The Open Forcefield Toolkit provides implementations of the SMIRNOFF format, parameterization engine, and other tools. Documentation available at http://open-forcefield-toolkit.readthedocs.io
http://openforcefield.org
MIT License
311 stars 90 forks source link

Unexpected `Unspecified Stereochemistry Error` in Molecule.from_rdkit #1410

Open wutobias opened 2 years ago

wutobias commented 2 years ago

Describe the bug I have an rdkit molecule object in which I have to set custom SetAtomMapNum for all atoms. Whenever I want to parse that rdkit object to an openff-toolkit Molecule object, I get an UndefinedStereochemistryError. This happens for molecules who don't even have steorecenters by construction and only if I add hydrogen atoms to them. If I set every atommap to 0 on every atom, the erroneous behaviour is not observed.

To Reproduce

from openff.toolkit.topology.molecule import Molecule
from rdkit.Chem import AllChem as Chem

### The following will crash with `UndefinedStereochemistryError`
rdmol = Chem.MolFromSmiles("CCCC")
rdmol = Chem.AddHs(rdmol)
for i, a in enumerate(rdmol.GetAtoms()):
    a.SetAtomMapNum(i)
offmol = Molecule.from_rdkit(rdmol)

### The following will not crash
rdmol = Chem.MolFromSmiles("CCCC")
rdmol = Chem.AddHs(rdmol)
for i, a in enumerate(rdmol.GetAtoms()):
    a.SetAtomMapNum(0)
offmol = Molecule.from_rdkit(rdmol)

Output

---------------------------------------------------------------------------
UndefinedStereochemistryError             Traceback (most recent call last)
Input In [54], in <cell line: 10>()
      7     a.SetAtomMapNum(i)
      8     #a.SetAtomMapNum(0)
---> 10 offmol = Molecule.from_rdkit(rdmol)

File ~/progs/miniconda3/envs/atomtyping/lib/python3.9/site-packages/openff/toolkit/utils/base_wrapper.py:51, in ToolkitWrapper.requires_toolkit.<locals>.decorator.<locals>.wrapped_function(*args, **kwargs)
     47     msg = "This function requires the {} toolkit".format(
     48         cls._toolkit_name
     49     )
     50     raise ToolkitUnavailableException(msg)
---> 51 value = func(*args, **kwargs)
     52 return value

File ~/progs/miniconda3/envs/atomtyping/lib/python3.9/site-packages/openff/toolkit/topology/molecule.py:5110, in FrozenMolecule.from_rdkit(cls, rdmol, allow_undefined_stereo, hydrogens_are_explicit)
   5079 """
   5080 Create a Molecule from an RDKit molecule.
   5081 
   (...)
   5107 
   5108 """
   5109 toolkit = RDKitToolkitWrapper()
-> 5110 molecule = toolkit.from_rdkit(
   5111     rdmol,
   5112     allow_undefined_stereo=allow_undefined_stereo,
   5113     hydrogens_are_explicit=hydrogens_are_explicit,
   5114     _cls=cls,
   5115 )
   5116 return molecule

File ~/progs/miniconda3/envs/atomtyping/lib/python3.9/site-packages/openff/toolkit/utils/rdkit_wrapper.py:1458, in RDKitToolkitWrapper.from_rdkit(self, rdmol, allow_undefined_stereo, hydrogens_are_explicit, _cls)
   1455 Chem.AssignStereochemistry(rdmol, cleanIt=False)
   1457 # Check for undefined stereochemistry.
-> 1458 self._detect_undefined_stereo(
   1459     rdmol,
   1460     raise_warning=allow_undefined_stereo,
   1461     err_msg_prefix="Unable to make OFFMol from RDMol: ",
   1462 )
   1464 # Create a new OpenFF Molecule
   1465 offmol = _cls()

File ~/progs/miniconda3/envs/atomtyping/lib/python3.9/site-packages/openff/toolkit/utils/rdkit_wrapper.py:2280, in RDKitToolkitWrapper._detect_undefined_stereo(cls, rdmol, err_msg_prefix, raise_warning)
   2278 else:
   2279     msg = "Unable to make OFFMol from RDMol: " + msg
-> 2280     raise UndefinedStereochemistryError(msg)

UndefinedStereochemistryError: Unable to make OFFMol from RDMol: Unable to make OFFMol from RDMol: RDMol has unspecified stereochemistry. Undefined chiral centers are:
 - Atom C (index 0)
 - Atom C (index 1)
 - Atom C (index 2)
 - Atom C (index 3)

Computing environment (please complete the following information):

Additional context

j-wags commented 2 years ago

Thanks for the report @wutobias. I recall seeing something like this before and will look into the root cause here.

cc #412 #472

j-wags commented 2 years ago

And (whether or not this is the root cause of this issue) I'd strongly recommend upgrading from 0.10.2 to the newest OFF toolkit release (either 0.10.6 if you need the old API, or 0.11.1 if not). There have been several critical bugfixes since 0.10.2. For example, Sage doesn't correctly apply TIP3P charges to water in 0.10.2.

wutobias commented 2 years ago

And (whether or not this is the root cause of this issue) I'd strongly recommend upgrading from 0.10.2 to the newest OFF toolkit release (either 0.10.6 if you need the old API, or 0.11.1 if not). There have been several critical bugfixes since 0.10.2. For example, Sage doesn't correctly apply TIP3P charges to water in 0.10.2.

Thanks. These are very useful comments. For this issue however, neither v0.10.6 nor 0.11.1 resolve the problem.

j-wags commented 2 years ago

I've found that both removing AddHs or setting the explicitOnly kwarg to true seems to resolves this issue. But I have no idea why this makes a difference. The output molecules do seem to have the expected chemical graph though (all Hs explicit).

rdmol = Chem.MolFromSmiles("CCCC")
rdmol = Chem.AddHs(rdmol, explicitOnly=True)
for i, a in enumerate(rdmol.GetAtoms()):
    a.SetAtomMapNum(i)

offmol = Molecule.from_rdkit(rdmol)

works

rdmol = Chem.MolFromSmiles("CCCC")
#rdmol = Chem.AddHs(rdmol)
for i, a in enumerate(rdmol.GetAtoms()):
    a.SetAtomMapNum(i)

offmol = Molecule.from_rdkit(rdmol)

also works.

Would either of these work for you in the short term, @wutobias? Either way I'm going to keep digging into the root cause (probably something to do with our handling of implicit Hs in RDMols)

wutobias commented 2 years ago

Would either of these work for you in the short term, @wutobias?

Yes, that works for me in the short term. Thanks for looking into it!