rxn4chemistry / rxnmapper

RXNMapper: Unsupervised attention-guided atom-mapping. Code complementing our Science Advances publication on "Extraction of organic chemistry grammar from unsupervised learning of chemical reactions" (https://advances.sciencemag.org/content/7/15/eabe4166).
http://rxnmapper.ai
MIT License
286 stars 68 forks source link

Rdkit coordination "->" after SMILES sanitization creates invalid reaction #50

Closed frederik-sandfort1 closed 3 months ago

frederik-sandfort1 commented 5 months ago

RDKit changed it's behavior to display coordination of metal ions (e.g. carboxylate cations). As this includes a "->", it is incompatible with rxnmappers splitting of the reaction at ">" (see here).

Example for Reaction sanitization:

from rdkit.Chem import AllChem
smi = "COC(=O)CCBr.O=C([O-][K+])c1ccccc1>>COC(=O)CCOC(=O)Cc1ccccc1"
rxn = AllChem.ReactionFromSmarts(smi, useSmiles=True)
AllChem.SanitizeRxnAsMols(rxn)
print(AllChem.ReactionToSmiles(rxn))
>>> "COC(=O)CCBr.O=C([O-]->[K+])c1ccccc1>>COC(=O)CCOC(=O)Cc1ccccc1"

Example for molecule sanitization:

from rdkit import Chem
smi = "COC(=O)CCBr.O=C([O-][K+])c1ccccc1>>COC(=O)CCOC(=O)Cc1ccccc1"
components = smi.split(">")
[Chem.MolToSmiles(Chem.MolFromSmiles(component)) for component in components]
>>> ['COC(=O)CCBr.O=C([O-]->[K+])c1ccccc1', '', 'COC(=O)CCOC(=O)Cc1ccccc1']

While likely the "->" is not in the vocab (and thus not compatible with the current model version), you could just check for the sign and replace it with an empty string, to stay compatible with the sanitization. Additionally, it could make sense to use rdkits rxn.GetReactants(), rxn.GetAgents() and rxn.GetProducts() function instead of the string splitting.

Just wanted to make you aware of this issue. Best Frederik

@avaucher

avaucher commented 3 months ago

Hi Frederik,

Thanks for reporting this. Such reaction SMILES should now work with the new version of rxn-chemutils (see change in https://github.com/rxn4chemistry/rxn-chemutils/pull/30). In #53, I added a test to confirm.

Note that although it technically works, the current model was not trained on a single reaction containing such dative bonds :)

frederik-sandfort1 commented 3 months ago

@avaucher thank you so much for implementing. I will check on our examples as well.

Also thanks for the note - I am aware of this and likely will keep the current workflow for removing the dative bonds :)