rxn4chemistry / rxnmapper

RXNMapper: Unsupervised attention-guided atom-mapping. Code complementing our Science Advances publication on "Extraction of organic chemistry grammar from unsupervised learning of chemical reactions" (https://advances.sciencemag.org/content/7/15/eabe4166).
http://rxnmapper.ai
MIT License
286 stars 68 forks source link

use get_attention_guided_atom_maps on generic compound #46

Closed steven-bioinfo closed 1 year ago

steven-bioinfo commented 1 year ago

Hello,

I want to launch the function get_attention_guided_atom_maps on generic compounds with and * (Any atom) in the smile. But in this case i have this error : ValueError: could not broadcast input array from shape (53,) into shape (52,)

code to reproduce ths error

from rxnmapper import RXNMapper
rxn_mapper = RXNMapper()
res = rxn_mapper.get_attention_guided_atom_maps(["O[*].CC(=O)SCCNC(=O)CCNC(=O)[C@H](O)C(C)(C)COP(OP([O-])(OC[C@@H]1([C@@H](OP([O-])([O-])=O)[C@@H](O)[C@@H](O1)N2(C3(\\N=C/N=C(C(\\N=C/2)=3)/N))))=O)([O-])=O>>CC(=O)O[*].CC(C)(COP([O-])(=O)OP(OC[C@H]3(O[C@@H](N1(C2(\\N=C/N=C(C(\\N=C/1)=2)/N)))[C@H](O)[C@H](OP([O-])(=O)[O-])3))(=O)[O-])[C@@H](O)C(=O)NCCC(=O)NCCS"])

If i remove the [] in the smile of reaction this function works. How can we run this function with an ?

thanks

avaucher commented 1 year ago

Hi @steven-bioinfo,

The easiest is to use the mapper without canonicalization. You can do this with the canonicalize_rxns=False option:

from rxnmapper import RXNMapper
rxn_mapper = RXNMapper()
res = rxn_mapper.get_attention_guided_atom_maps(
    [
        "O[*].CC(=O)SCCNC(=O)CCNC(=O)[C@H](O)C(C)(C)COP(OP([O-])(OC[C@@H]1([C@@H](OP([O-])([O-])=O)[C@@H](O)[C@@H](O1)N2(C3(\\N=C/N=C(C(\\N=C/2)=3)/N))))=O)([O-])=O>>CC(=O)O[*].CC(C)(COP([O-])(=O)OP(OC[C@H]3(O[C@@H](N1(C2(\\N=C/N=C(C(\\N=C/1)=2)/N)))[C@H](O)[C@H](OP([O-])(=O)[O-])3))(=O)[O-])[C@@H](O)C(=O)NCCC(=O)NCCS"
    ],
    canonicalize_rxns=False,
)

which then returns the following:

[{'confidence': 0.2130105220785321, 'mapped_rxn': '[OH:4][*:5].[CH3:1][C:2](=[O:3])[S:53][CH2:52][CH2:51][NH:50][C:48](=[O:49])[CH2:47][CH2:46][NH:45][C:43](=[O:44])[C@H:41]([OH:42])[C:7]([CH3:6])([CH3:8])[CH2:9][O:10][P:11]([O:14][P:15]([O-:40])([O:16][CH2:17][C@@H:18]1[C@@H:33]([O:34][P:35]([O-:36])([O-:38])=[O:37])[C@@H:31]([OH:32])[C@H:20]([N:21]2[C:22]3=[C:27]([C:26]([NH2:30])=[N:25][CH:24]=[N:23]3)/[N:28]=[CH:29]\\2)[O:19]1)=[O:39])([O-:12])=[O:13]>>[CH3:1][C:2](=[O:3])[O:4][*:5].[CH3:6][C:7]([CH3:8])([CH2:9][O:10][P:11]([O-:12])(=[O:13])[O:14][P:15]([O:16][CH2:17][C@H:18]1[O:19][C@@H:20]([N:21]2[C:22]3=[C:27]([C:26]([NH2:30])=[N:25][CH:24]=[N:23]3)/[N:28]=[CH:29]\\2)[C@H:31]([OH:32])[C@@H:33]1[O:34][P:35]([O-:36])(=[O:37])[O-:38])(=[O:39])[O-:40])[C@@H:41]([OH:42])[C:43](=[O:44])[NH:45][CH2:46][CH2:47][C:48](=[O:49])[NH:50][CH2:51][CH2:52][SH:53]'}]

Closing the issue, as I think this solves it. Feel free to reopen if needed.