Open jchodera opened 1 year ago
Reminds one a bit of #566 / https://github.com/openforcefield/openff-toolkit/pull/762, no?
Thankfully, this doesn't seem to be mangling the chemistry of any molecules. The number of hydrogens on each heavy atom is already specified before the molecule enters the OpenFF ecosystem:
from openff.toolkit.topology import Molecule
from openeye import oechem
oemol = oechem.OEGraphMol()
oechem.OESmilesToMol(oemol, 'CCO')
for oeatom in oemol.GetAtoms():
print(oeatom.GetImplicitHCount())
3 2 1
The atom ordering/indices of the "existent" (non-implicit) atoms also seem to be preserved (though the OE docs don't explicitly guarantee this):
offmol = Molecule.from_openeye(oemol)
[atom.symbol for atom in offmol.atoms]
['C', 'C', 'O', 'H', 'H', 'H', 'H', 'H', 'H']
Making implicit hydrogens explicit is a major user convenience that I'd currently advocate for even if we didn't have existing behavior to consider. But we do have existing behavior to consider, and changing this to throwing an error would constitute a major behavior change and would break many user workflows. I'd be happy to resolve this with a PR that updates docstrings to say that implicit Hs become explicit. I don't think it's in scope for the OFF Toolkit to expand the Molecule class to support implicit Hs.
Making implicit hydrogens explicit is a major user convenience that I'd currently advocate for even if we didn't have existing behavior to consider.
Yes, absolutely---but it shouldn't happen silently. As I recommended above, if the user wants to use this convenience---which will alter the atom ordering/identities---the additional burden of requiring the user specify a flag prevents mistakes and misuse.
changing this to throwing an error would constitute a major behavior change and would break many user workflows.
Is there any evidence of this? And even if so, if folks are relying on a dangerous, silent behavior that modifies atom ordering---which is critically important for us to preserve---then they should update the behavior to make the request to modify the molecule explicit.
Again, this would be fully consistent with the current requirement that the user explcitly request that handle molecules with undefined stereochemistry since they may be altered as well.
The atom ordering/indices of the "existent" (non-implicit) atoms also seem to be preserved (though the OE docs don't explicitly guarantee this):
This is certainly not guaranteed to always be the case.
Describe the bug
Molecule.from_openeye()
silently adds hydrogens if they are implicit in this line rather than raising an Exception. This is dangerous behavior, since the main utility of the{to|from}_openeye()
methods is to transform the exact representation (and especially the exact atom numbering) between OpenFF and OpenEye toolkits. If the representation (and especially number of atoms and atom numberings) can silently change, this will cause major issues.This is already partly enforced by the fact that
Molecule.from_openeye()
requires the user to specifyallow_undefined_stereo=True
if the incoming representation isn't full explicit (in stereochemistry). It does not make sense to allow the representation to not be fully explicit in otherways without a user-specified override (e.g.allow_implicit_hydrogens=True
).To Reproduce
Output See above.
Computing environment (please complete the following information):
linux
x86_64
conda list
Additional context This was discovered in preparing bug reports for #1710 and #1711