Closed jthorton closed 1 year ago
Wow - nice catch. This causes tons of problems in the toolkit, actually. It's easy to find separate cases of the toolkit ignoring the isotopes and choking on them:
>>> [atom.mass.m for atom in Molecule.from_smiles("[2H]O[2H]").atoms]
[1.007947, 15.99943, 1.007947]
>>> Molecule.from_smiles("[13C]")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/mattthompson/mambaforge/envs/smirnoff-plugins-test/lib/python3.9/site-packages/openff/toolkit/topology/molecule.py", line 1807, in from_smiles
molecule = toolkit_registry.call(
File "/Users/mattthompson/mambaforge/envs/smirnoff-plugins-test/lib/python3.9/site-packages/openff/toolkit/utils/toolkit_registry.py", line 356, in call
raise e
File "/Users/mattthompson/mambaforge/envs/smirnoff-plugins-test/lib/python3.9/site-packages/openff/toolkit/utils/toolkit_registry.py", line 352, in call
return method(*args, **kwargs)
File "/Users/mattthompson/mambaforge/envs/smirnoff-plugins-test/lib/python3.9/site-packages/openff/toolkit/utils/rdkit_wrapper.py", line 1035, in from_smiles
molecule = self.from_rdkit(
File "/Users/mattthompson/mambaforge/envs/smirnoff-plugins-test/lib/python3.9/site-packages/openff/toolkit/utils/rdkit_wrapper.py", line 1756, in from_rdkit
raise RadicalsNotSupportedError(
openff.toolkit.utils.exceptions.RadicalsNotSupportedError: The OpenFF Toolkit does not currently support parsing molecules with S- and P-block radicals. Found 4 radical electrons on molecule [13C].
Prior to having thought about this much, I wonder if rolling a custom solution here would be easier than getting isotope support into the toolkit.
Erroring out if an isotope is passed through would be an improvement in the sense that unsupported behavior (even unintentionally unsupported) is handled more gracefully, but I figure that won't actually be an improvement in getting research done.
I wonder if rolling a custom solution here would be easier than getting isotope support into the toolkit.
I agree that a custom solution here might be easier, I wonder if all smirks matching might be better done using rdkit directly to save time as the current workflow as I understand it goes smiles -> parse with rdkit/openeye -> convert to off-Mol -> substructure search with rdkit/openeye. When filtering 70k records in thermoML this could save some time and give the correct behaviour.
I was also a bit worried that Evaluator didn't depend on RDKit but I checked and it's explicitly listed; I guess the -base
trick is just used to avoid AmberTools.
Mostly resolved with #503 / release v0.4.3. There might be performance optimizations left on the table - didn't have the time to look deep enough into that.
Filtering molecules with isotopes from thermoML with evaluator does not work as the openff-toolkit molecules used during the filtering don't track if the atom is an isotope or not.
Example
Maybe it would be better to use rdkit for this function or provide a specific filter which can handle this using a simple search in the string like
mol_smiles.find("[2H]")
?conda list openff