rxn4chemistry / rxnmapper

RXNMapper: Unsupervised attention-guided atom-mapping. Code complementing our Science Advances publication on "Extraction of organic chemistry grammar from unsupervised learning of chemical reactions" (https://advances.sciencemag.org/content/7/15/eabe4166).
http://rxnmapper.ai
MIT License
286 stars 68 forks source link

Imbalanced Stoichiometry/ Missing Atom Mapping #28

Closed JHucker closed 2 years ago

JHucker commented 2 years ago

Hi all, I was hoping to get some help/ information with regards to mapping reactions which are imbalanced. I have a pipeline which utilises rxnmapper for AAM, downstream a template is extracted and then validated. On a private dataset there is an unacceptable amount of failed validations.

For reproducibility in this issue I have checked for the same validation failures on USPTO. For USPTO these failures appear to relate back to: mapping where the reaction is imbalanced / there is implied stoichiometry and atom map numbers are not repeated on the RHS / missing.

Some examples:

example_n,original_uspto,pre_mapping,post_mapping
0,[C:1](#[N:4])[CH:2]=[CH2:3].[NH3:5]>>[C:1]([CH2:2][CH2:3][NH:5][CH2:3][CH2:2][C:1]#[N:4])#[N:4],C=CC#N.N>>N#CCCNCCC#N,[CH2:2]=[CH:7][C:8]#[N:9].[NH3:1]>>[N:1]#[C:2][CH2:3][CH2:4][NH:5][CH2:6][CH2:7][C:8]#[N:9]
1,[CH2:1]([C:4]#[N:5])[CH2:2][OH:3]>O>[C:4]([CH2:1][CH2:2][O:3][CH2:2][CH2:1][C:4]#[N:5])#[N:5],N#CCCO.O>>N#CCCOCCC#N,[N:1]#[C:2][CH2:3][CH2:4][OH:5]>>[N:1]#[C:2][CH2:3][CH2:4][O:5][CH2:6][CH2:7][C:8]#[N:9]
2,[CH2:1]([NH2:4])[CH2:2][NH2:3].[CH:5]([CH:7]=O)=O>>[NH:3]1[CH:2]2[NH:3][CH2:5][CH2:7][NH:4][CH:1]2[NH:4][CH2:1][CH2:2]1,NCCN.O=CC=O>>C1CNC2NCCNC2N1,O=[CH:1][CH:7]=O.[CH2:2]([NH2:3])[CH2:4][NH2:5]>>[CH2:1]1[CH2:2][NH:3][CH:4]2[NH:5][CH2:6][CH2:7][NH:8][CH:9]2[NH:10]1
  1. it appears that there should be 2x "C=CC#N" and maybe that atom maps 2, 7, 8 and 9 should be repeated on the RHS?
  2. 2x "N#CCCO" are required, 1-4 " "
  3. 2x "NCCN" are required, 2-5 " "

Is it realistic of me to expect that rxnmapper should handle imbalanced reactions in the manner I've described or should these be balanced prior to mapping? Please let me know if any further information or examples are required.

Appreciate your assistance with this.

Other information

This is using rxnmapper 0.2.0, outcome is the same irrespective of canonicalize_rxns=False or True.

Note that SMILES in 'pre_mapping' have undergone a standardisation process and those in 'post_mapping' have had molecules with no atom mapping removed.

avaucher commented 2 years ago

Hi @JHucker,

You are right, reactions with reactants "consumed" multiple times are not predicted adequately in the current version of rxnmapper. I do not see an easy fix for that, and it would probably require some extended research / trial-and-error to detect this automatically.

Balancing the reactions prior to mapping should work, usually.

Let us know if that doesn't answer your question, and sorry for not being able to provide you with an easy solution.

JHucker commented 2 years ago

Thanks, @avaucher. At present this is sufficient information. However, in future, it would be excellent to see this introduced as a new feature. I will re-raise this in future if I can obtain sufficient justification and more examples including output from other AAM tools re-using reactants.