openforcefield / openff-toolkit

The Open Forcefield Toolkit provides implementations of the SMIRNOFF format, parameterization engine, and other tools. Documentation available at http://open-forcefield-toolkit.readthedocs.io
http://openforcefield.org
MIT License
301 stars 88 forks source link

openff.toolkit.Molecule unable to parse molecule in .sdf file, but is ok with parsing from rdkit #1872

Closed David-Araripe closed 2 months ago

David-Araripe commented 2 months ago

I have a .sdf file that I want to parse using openff.toolkit.Molecule. In general, I don't have problems with reading molecules like this.

However, this was an exception. When trying to read the attached .sdf file with the method Molecule.from_file(fpath, allow_undefined_stereo=True), I got the following logging message:

[07:20:59] Explicit valence for atom # 2 O, 3, is greater than permitted
68 Explicit valence for atom # 2 O, 3, is greater than permitted

Followed by the error:

in FrozenMolecule.from_file(cls, file_path, file_format, toolkit_registry, allow_undefined_stereo)
   3912     mols = toolkit.from_file_obj(
   3913         file_obj,
   3914         file_format=file_format,
   3915         allow_undefined_stereo=allow_undefined_stereo,
   3916         _cls=cls,
   3917     )
   3919 if len(mols) == 0:
-> 3920     raise MoleculeParseError(f"Unable to read molecule from file: {file_path}")
   3921 elif len(mols) == 1:
   3922     return mols[0]

MoleculeParseError: Unable to read molecule from file: 68_27_aligned.sdf

Solves it for the moment A workaround that I found to work is loading the molecule by doing:

fpath = "68_27_aligned.sdf"
supplier = Chem.SDMolSupplier(fpath)[0]
molecule = Molecule.from_rdkit(supplier)

Is this the desired behavior from the Molecule class? If so, is there some parameter I can pass so that I can still read he molecule file using the from_file() method?

To reproduce the error & get the version I'm using

from openff import toolkit
from openff.toolkit 
toolkit._version.get_versions()
fpath = "68_27_aligned.sdf"
molecule = Molecule.from_file(fpath, allow_undefined_stereo=True)

Where the version details are:

{'date': '2024-04-12T12:59:25-0700',
 'dirty': False,
 'error': None,
 'full-revisionid': '3bb4d9faa0cb481770c0974a845581f98629fb45',
 'version': '0.16.0'}
David-Araripe commented 2 months ago

Forgot the send the file, sorry: 68_27_aligned.txt

mattwthompson commented 2 months ago

I don't think you're doing anything wrong; I can load it using the OEChem backend and it looks like a sensible molecule in Avogadro as well. The toolkit internally has a different way of going through to the SDF supplier which is probably the cause of this trivalent oxygen you're seeing (?)

https://github.com/openforcefield/openff-toolkit/blob/f282282c6c5fb14df4c479cb7c689984bf3952f0/openff/toolkit/utils/rdkit_wrapper.py#L1074-L1076 / https://github.com/openforcefield/openff-toolkit/blob/f282282c6c5fb14df4c479cb7c689984bf3952f0/openff/toolkit/utils/rdkit_wrapper.py#L1009

Specifically here's where the failure bubbles up from: https://github.com/openforcefield/openff-toolkit/blob/f282282c6c5fb14df4c479cb7c689984bf3952f0/openff/toolkit/utils/rdkit_wrapper.py#L1019-L1024

I don't know why this is happening - Jeff, Josh, or somebody else may drop by and actually answer your question 😅

David-Araripe commented 2 months ago

Indeed, there was something wrong with the molecule I sent. I added hydrogens at some point of processing, but I noticed that was wrong in the end, as my molecules had an "empty" valence configuration. Once I corrected this issue, I didn't see this bug anymore! Thank you very much for having a look into this!

If it's something you'd like to further investigate, fine, but from my side I'm fine with having this issue closed

j-wags commented 2 months ago

Thanks for writing in and for the update, @David-Araripe. I'll mark this as closed.