openforcefield / openff-toolkit

The Open Forcefield Toolkit provides implementations of the SMIRNOFF format, parameterization engine, and other tools. Documentation available at http://open-forcefield-toolkit.readthedocs.io
http://openforcefield.org
MIT License
318 stars 92 forks source link

Molecule.from_openeye() does not preserve SD tags for OEGraphMol objects #1711

Open jchodera opened 1 year ago

jchodera commented 1 year ago

Describe the bug Molecule.from_openeye(oemol) correctly preserves SD tags if oemol is an OEMol, but not if it is an OEGraphMol.

I'm not exactly sure why this is happening. Note that the input molecule is forcibly recast to OEMol in this line, but this should not destroy the SDData. Something in these intervening lines must somehow clear this data.

It's possible that the SDData could simply be pulled from the original molecule in these lines without issue.

To Reproduce

>>> from openff.toolkit.topology import Molecule
>>> offmol = Molecule.from_smiles('CCO')
>>> offmol.properties['annotation'] = 'ethanol'
>>> offmol.properties
{'annotation': 'ethanol'}
>>> oemol = offmol.to_openeye()
>>> from openeye import oechem
>>> for dp in oechem.OEGetSDDataPairs(oemol): print(dp.GetTag(), dp.GetValue())
... 
annotation ethanol
>>> oegraphmol = oechem.OEGraphMol(oemol)
>>> for dp in oechem.OEGetSDDataPairs(oegraphmol): print(dp.GetTag(), dp.GetValue())
... 
annotation ethanol
>>> offmol2 = Molecule.from_openeye(oegraphmol)
>>> offmol2.properties
{}
>>> offmol3 = Molecule.from_openeye(oemol)
>>> offmol3.properties
{'annotation': 'ethanol'}

Output See above.

Computing environment (please complete the following information):

Additional context This was discovered in preparing the bug report for #1710

mattwthompson commented 11 months ago

the input molecule is forcibly recast to OEMol ... but this should not destroy the SDData

That seems to be the culprit here

ipdb> [*oechem.OEGetSDDataPairs(protomer)][0].GetTag()
'annotation'
ipdb> [*oechem.OEGetSDDataPairs(protomer)][0].GetValue()
'acetic acid'
ipdb> oemol = oechem.OEMol(protomer)
ipdb> [*oechem.OEGetSDDataPairs(oemol)][0].GetTag()
*** IndexError: list index out of range
ipdb> [*oechem.OEGetSDDataPairs(oemol)][0].GetValue()
*** IndexError: list index out of range
mattwthompson commented 11 months ago

This does not happen generally with Molecule.from_openeye when provided a full oechem.OEMol:

In [1]: from openff.toolkit import Molecule

In [2]: offmol = Molecule.from_smiles('COOH')

In [3]: offmol.properties['annotation'] = 'acetic acid'

In [4]: Molecule.from_openeye(offmol.to_openeye()).properties
Out[4]: {'annotation': 'acetic acid'}

In [5]: type(offmol.to_openeye())
Out[5]: openeye.oechem.OEMol

so I assume that SD tags are meant to be preserved when interoperating between OEChem and the toolkit