rxn4chemistry / rxnmapper

RXNMapper: Unsupervised attention-guided atom-mapping. Code complementing our Science Advances publication on "Extraction of organic chemistry grammar from unsupervised learning of chemical reactions" (https://advances.sciencemag.org/content/7/15/eabe4166).
http://rxnmapper.ai
MIT License
286 stars 68 forks source link

atom mapping for some reaction is error, how to fix this probrom #12

Closed xiaoqiangsheng2016 closed 2 years ago

xiaoqiangsheng2016 commented 3 years ago

for this reaction OB(C1=CCCCC1)O.ClC2=CC=CC=C2>>C3(C4=CC=CC=C4)=CCCCC3

the output mapping is: Cl[c:3]1[cH:4][cH:5][cH:6][cH:7][cH:8]1.OB(O)[C:1]1=[CH:2][CH2:9][CH2:10][CH2:11][CH2:12]1>>[CH:1]1=[C:2]([c:3]2[cH:4][cH:5][cH:6][cH:7][cH:8]2)[CH2:9][CH2:10][CH2:11][CH2:12]1

20210610215727

pschwllr commented 3 years ago

You're right - the mapping is wrong in this specific case. The beauty of the approach is that it learned to map reactions without unsupervised (without having labelled examples). That said, it is not straightforward to improve and correct the model. Reporting wrongly mapped reactions like you just did is very helpful, as we will consider them for the next RXNMapper version.

Let me show explain to you what happens by looking at the attention weights on the RXNMapper Demo. If we look at the product atoms 1 and 2, and their attention weights towards the reactant atoms:

image

Attention of product atom 1.

image

Attention of product atom 2.

We see that atom 2 has the stronger attention weight, and will be mapped first to the wrong reactant atom.

One reason for this attention pattern is how the SMILES are canonicalised for this specific case. If we look at the wrongly mapped part of the SMILES, we have the following ... C 1 = C ... >> C 1 = C ....

This is incorrectly mapped by RXNMapper to ... [C:1]1=[CH:2] ... >> [C:1]1=[CH:2] .... Due to the canonicalisation, in this case, the two carbon in the SMILES switched side around the 1 =. The correct mapping would be ... [C:2]1=[CH:1] ... >> [C:1]1=[CH:2] ....

Although your example is straightforward to map for a human, it is challenging for an AI model that sees the molecules only as SMILES strings.

feiranl commented 2 years ago

Hi, I also want to report another case which is wrongly mapped. rxn = L-glutamine <=> NH4(+) + L-glutamate Smiles = NC(=O)CC[C@H]([NH3+])C(=O)[O-]>>[NH4+].[NH3+][C@@H](CCC(=O)[O-])C(=O)[O-] mapped rxn = [NH3+:1][C@@H:2]([CH2:3][CH2:4][C:5](=[O:6])[NH2:11])[C:8](=[O:7])[O-:10]>>[NH3+:1][C@@H:2]([CH2:3][CH2:4][C:5](=[O:6])[O-:7])[C:8](=[O:9])[O-:10].[NH4+:11] image The change of O7 is not correct. The H2O is omitted in the reaction.

pschwllr commented 2 years ago

With the default parameters, RXNMapper will map all product atoms iteratively (as we assume that all atoms are present on the precursors side). The problem is that the last product oxygen atom to map overwrites another previously mapped oxygen atom, as you omitted H2O in the reaction. If you add the H2O, I guess it maps it correctly:

image
avaucher commented 2 years ago

Closing due to inactivity.