openforcefield / openff-toolkit

The Open Forcefield Toolkit provides implementations of the SMIRNOFF format, parameterization engine, and other tools. Documentation available at http://open-forcefield-toolkit.readthedocs.io
http://openforcefield.org
MIT License
309 stars 90 forks source link

Poor error messages on InChI parsing failures #1674

Closed mattwthompson closed 1 year ago

mattwthompson commented 1 year ago

Is your feature request related to a problem? Please describe.

As somebody who barely knows what they're doing, I get InChI and InChIKey identifiers confused, and the toolkit offers Molecule.to_inchi and Molecule.to_inchikey but only Molecule.from_inch. As a result, I find it's easy to pass a key to from_inchi which produces this unhelpful error message from a RuntimeError, which I'd argue is not an the exception that should be thrown:

RuntimeError: There was an issue parsing the InChI string, please check and try again.

It'd be nice to know which string is failing, and ideally also why. OpenEye happens to give a hint if you parse the whole traceback, RDKit does not:

In [17]: with _toolkit_registry_manager(RDKitToolkitWrapper()):
    ...:     Molecule.from_inchi("asdfijidjfidjifjdifj")
[12:06:32] ERROR:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[17], line 2
      1 with _toolkit_registry_manager(RDKitToolkitWrapper()):
----> 2     Molecule.from_inchi("asdfijidjfidjifjdifj")

File ~/mambaforge/envs/ib-dev/lib/python3.9/site-packages/openff/toolkit/topology/molecule.py:1657, in FrozenMolecule.from_inchi(cls, inchi, allow_undefined_stereo, toolkit_registry, name)
   1625 """
   1626 Construct a Molecule from a InChI representation
   1627
   (...)
   1653 >>> molecule = Molecule.from_inchi('InChI=1S/C2H2Cl2/c3-1-2-4/h1-2H/b2-1-')
   1654 """
   1656 if isinstance(toolkit_registry, ToolkitRegistry):
-> 1657     molecule = toolkit_registry.call(
   1658         "from_inchi",
   1659         inchi,
   1660         _cls=cls,
   1661         allow_undefined_stereo=allow_undefined_stereo,
   1662         name=name,
   1663     )
   1664 elif isinstance(toolkit_registry, ToolkitWrapper):
   1665     toolkit = toolkit_registry

File ~/mambaforge/envs/ib-dev/lib/python3.9/site-packages/openff/toolkit/utils/toolkit_registry.py:356, in ToolkitRegistry.call(self, method_name, raise_exception_types, *args, **kwargs)
    354             for exception_type in raise_exception_types:
    355                 if isinstance(e, exception_type):
--> 356                     raise e
    357             errors.append((toolkit, e))
    359 # No toolkit was found to provide the requested capability
    360 # TODO: Can we help developers by providing a check for typos in expected method names?

File ~/mambaforge/envs/ib-dev/lib/python3.9/site-packages/openff/toolkit/utils/toolkit_registry.py:352, in ToolkitRegistry.call(self, method_name, raise_exception_types, *args, **kwargs)
    350 method = getattr(toolkit, method_name)
    351 try:
--> 352     return method(*args, **kwargs)
    353 except Exception as e:
    354     for exception_type in raise_exception_types:

File ~/mambaforge/envs/ib-dev/lib/python3.9/site-packages/openff/toolkit/utils/rdkit_wrapper.py:1165, in RDKitToolkitWrapper.from_inchi(self, inchi, allow_undefined_stereo, _cls, name)
   1163 # try and catch an InChI parsing error
   1164 if rdmol is None:
-> 1165     raise RuntimeError(
   1166         "There was an issue parsing the InChI string, please check and try again."
   1167     )
   1169 # process the molecule
   1170 # TODO do we need this with inchi?
   1171 rdmol.UpdatePropertyCache(strict=False)

RuntimeError: There was an issue parsing the InChI string, please check and try again.

In [18]: with _toolkit_registry_manager(OpenEyeToolkitWrapper()):
    ...:     Molecule.from_inchi("asdfijidjfidjifjdifj")
Warning: OEParseInChI failed on 'asdfijidjfidjifjdifj'
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[18], line 2
      1 with _toolkit_registry_manager(OpenEyeToolkitWrapper()):
----> 2     Molecule.from_inchi("asdfijidjfidjifjdifj")

File ~/mambaforge/envs/ib-dev/lib/python3.9/site-packages/openff/toolkit/topology/molecule.py:1657, in FrozenMolecule.from_inchi(cls, inchi, allow_undefined_stereo, toolkit_registry, name)
   1625 """
   1626 Construct a Molecule from a InChI representation
   1627
   (...)
   1653 >>> molecule = Molecule.from_inchi('InChI=1S/C2H2Cl2/c3-1-2-4/h1-2H/b2-1-')
   1654 """
   1656 if isinstance(toolkit_registry, ToolkitRegistry):
-> 1657     molecule = toolkit_registry.call(
   1658         "from_inchi",
   1659         inchi,
   1660         _cls=cls,
   1661         allow_undefined_stereo=allow_undefined_stereo,
   1662         name=name,
   1663     )
   1664 elif isinstance(toolkit_registry, ToolkitWrapper):
   1665     toolkit = toolkit_registry

File ~/mambaforge/envs/ib-dev/lib/python3.9/site-packages/openff/toolkit/utils/toolkit_registry.py:356, in ToolkitRegistry.call(self, method_name, raise_exception_types, *args, **kwargs)
    354             for exception_type in raise_exception_types:
    355                 if isinstance(e, exception_type):
--> 356                     raise e
    357             errors.append((toolkit, e))
    359 # No toolkit was found to provide the requested capability
    360 # TODO: Can we help developers by providing a check for typos in expected method names?

File ~/mambaforge/envs/ib-dev/lib/python3.9/site-packages/openff/toolkit/utils/toolkit_registry.py:352, in ToolkitRegistry.call(self, method_name, raise_exception_types, *args, **kwargs)
    350 method = getattr(toolkit, method_name)
    351 try:
--> 352     return method(*args, **kwargs)
    353 except Exception as e:
    354     for exception_type in raise_exception_types:

File ~/mambaforge/envs/ib-dev/lib/python3.9/site-packages/openff/toolkit/utils/openeye_wrapper.py:2066, in OpenEyeToolkitWrapper.from_inchi(self, inchi, allow_undefined_stereo, _cls, name)
   2063 # try and catch InChI parsing fails
   2064 # if there are no atoms don't build the molecule
   2065 if oemol.NumAtoms() == 0:
-> 2066     raise RuntimeError(
   2067         "There was an issue parsing the InChI string, please check and try again."
   2068     )
   2070 molecule = self.from_openeye(
   2071     oemol,
   2072     allow_undefined_stereo=allow_undefined_stereo,
   2073     _cls=_cls,
   2074 )
   2075 molecule.name = name

RuntimeError: There was an issue parsing the InChI string, please check and try again.

Describe the solution you'd like

Give me a more descriptive error about which string failed.

It'd also be nice to catch the case of me passing a key when the toolkit expects a full identifier, and vice versa.

Describe alternatives you've considered

Users can handle this themselves, but the toolkit should handle stuff like this - it's largely the point of why it wraps other toolkits to begin with.

Additional context

mattwthompson commented 1 year ago

1675