openforcefield / openff-toolkit

The Open Forcefield Toolkit provides implementations of the SMIRNOFF format, parameterization engine, and other tools. Documentation available at http://open-forcefield-toolkit.readthedocs.io
http://openforcefield.org
MIT License
311 stars 90 forks source link

[Request] Isotopic smiles #974

Open lilyminium opened 3 years ago

lilyminium commented 3 years ago

Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Molecule.from_smiles("[13C]").to_smiles() does not roundtrip because the isotope of carbon is ignored.

Describe the solution you'd like A clear and concise description of what you want to happen.

Track the isotope of SMILES, unless it's a specific design decision to neglect isotopes.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Continue not tracking isotopes.

Additional context Add any other context or screenshots about the feature request here.

>>> Molecule.from_smiles("[13C]").to_smiles()
'[C]'

Both toolkits used for SMILES can do this, it's just that the openff Molecule itself does not track which isotopes are associated with each atom.

>>> from rdkit import Chem
>>> Chem.MolToSmiles(Chem.MolFromSmiles("[13C]"))
'[13C]'
>>> from openeye import oechem
>>> oemol = oechem.OEGraphMol()
>>> oechem.OESmilesToMol(oemol, "[13C]")
>>> oechem.OECreateSmiString(oemol, oechem.OESMILESFlag_Isotopes)
'[13C]'
lilyminium commented 3 years ago

I guess the easiest way to do it would be to map it in properties as atom mapping is currently done.

adalke commented 3 years ago

This is tricky. The OpenFF Molecule builds on the simtk/openmmp element. That table doesn't support isotopes ... except for deuterium. It uses a table indexed by atomic number and symbol. It allows two symbols have the same atomic number, in which case it prioritizes the one with the lower mass, to handle the 'H' vs 'D' case and have atomic number 1 map to "H".

Handling isotopes correctly requires a table of (atomic number, isotope number) -> isotopic weight. That's what the various cheminformatics toolkits do.

Is that needed for OpenFF? Is it a meta-toolkit that abstracts away the other toolkits, or is its goal more focused on MM? If the latter, then do the force fields distinguish between, eg, ³⁵Cl vs ³⁷Cl? Or do they parameterize based on natural abundance?

adalke commented 3 years ago

I see the OpenFF toolkit almost never uses the atomic mass. Perhaps the right solution is to drop atom.mass entirely, and add an isotope property. I expect different forcefields will have different masses, perhaps because a FF designed now will use different weights than, say, a re-implementation of CHARMm23 when the weights weren't as well determined.

Then, if OpenFF users need a MW calculator, the toolkit API could gain a "get periodic table" function, which provides a meta-toolkit interface to the appropriate underlying toolkit functionality. Assuming that any tiny differences in isotopic weights or mass abundances is acceptable.

lilyminium commented 3 years ago

The implementation would differ based on whether isotopes should be chemically relevant or just labels, yeah. I'm not sure which direction OpenFF wants to go -- but ignoring SMILES does neglect a region of SMARTS pattern matching, and the OpenEye to_smiles function does explicitly want to include isotopes (https://github.com/openforcefield/openff-toolkit/blob/1ba11ebbafa67f7953b13f6c0af6183a5801f33e/openff/toolkit/utils/toolkits.py#L1727)

adalke commented 3 years ago

I believe it's okay to omit a region of SMARTS pattern matching for any part of SMARTS which isn't relevant to OpenFF.

It does not appear that OESMILESFlag_Isotopes was added because of a broader intent to support isotopes in OpenFF.

There is only a handful of mentions of "isotope". For example, openff/toolkit/tests/test_toolkits.py has "TODO: Add test and molecule functionality for isotopes"

The OESMILESFlag_Isotopes flag you pointed to was first added in commit e3ee30f727ec8174c233ad7b0a211d7b62f0d304 . The commit method says nothing about isotope support. It's dated Oct 15, 2018. Atom and bond stereo flags were added in 797566052e5823a67d797255c5ffb95a6e61e98d .

The current code looks like it was added in 920a94b08cb725b1cd066206b31e3dc052483987 . I see no mention of a specific reason to support isotopes. It's dated Mar 3, 2020 .

The comment about that line you pointed to says "this sets up the default settings following the old DEFAULT flag". However, the current OEChem documentation for DEFAULT says it's the same as "Canonical | AtomMaps | RGroups".

I don't know what "old" refers to there. I know OEChem has changed the default flags between 1.x and 2.x.

My guess is that when @j-wags added OESMILESFlag_Hydrogens he also added OESMILESFlag_Isotope with that thought that that would be useful. At some point there was a decision to be explicit about the flags used, rather than use OpenEye's "DEFAULT". And that's where we are.