Closed xiaoqiangsheng2016 closed 2 years ago
You're right - the mapping is wrong in this specific case. The beauty of the approach is that it learned to map reactions without unsupervised (without having labelled examples). That said, it is not straightforward to improve and correct the model. Reporting wrongly mapped reactions like you just did is very helpful, as we will consider them for the next RXNMapper version.
Let me show explain to you what happens by looking at the attention weights on the RXNMapper Demo. If we look at the product atoms 1 and 2, and their attention weights towards the reactant atoms:
Attention of product atom 1.
Attention of product atom 2.
We see that atom 2 has the stronger attention weight, and will be mapped first to the wrong reactant atom.
One reason for this attention pattern is how the SMILES are canonicalised for this specific case. If we look at the wrongly mapped part of the SMILES, we have the following ... C 1 = C ... >> C 1 = C ...
.
This is incorrectly mapped by RXNMapper to ... [C:1]1=[CH:2] ... >> [C:1]1=[CH:2] ...
. Due to the canonicalisation, in this case, the two carbon in the SMILES switched side around the 1 =
.
The correct mapping would be ... [C:2]1=[CH:1] ... >> [C:1]1=[CH:2] ...
.
Although your example is straightforward to map for a human, it is challenging for an AI model that sees the molecules only as SMILES strings.
Hi, I also want to report another case which is wrongly mapped.
rxn = L-glutamine <=> NH4(+) + L-glutamate
Smiles = NC(=O)CC[C@H]([NH3+])C(=O)[O-]>>[NH4+].[NH3+][C@@H](CCC(=O)[O-])C(=O)[O-]
mapped rxn = [NH3+:1][C@@H:2]([CH2:3][CH2:4][C:5](=[O:6])[NH2:11])[C:8](=[O:7])[O-:10]>>[NH3+:1][C@@H:2]([CH2:3][CH2:4][C:5](=[O:6])[O-:7])[C:8](=[O:9])[O-:10].[NH4+:11]
The change of O7
is not correct. The H2O
is omitted in the reaction.
With the default parameters, RXNMapper will map all product atoms iteratively (as we assume that all atoms are present on the precursors side). The problem is that the last product oxygen atom to map overwrites another previously mapped oxygen atom, as you omitted H2O
in the reaction. If you add the H2O
, I guess it maps it correctly:
Closing due to inactivity.
for this reaction
OB(C1=CCCCC1)O.ClC2=CC=CC=C2>>C3(C4=CC=CC=C4)=CCCCC3
the output mapping is:
Cl[c:3]1[cH:4][cH:5][cH:6][cH:7][cH:8]1.OB(O)[C:1]1=[CH:2][CH2:9][CH2:10][CH2:11][CH2:12]1>>[CH:1]1=[C:2]([c:3]2[cH:4][cH:5][cH:6][cH:7][cH:8]2)[CH2:9][CH2:10][CH2:11][CH2:12]1