rxn4chemistry / rxnmapper

RXNMapper: Unsupervised attention-guided atom-mapping. Code complementing our Science Advances publication on "Extraction of organic chemistry grammar from unsupervised learning of chemical reactions" (https://advances.sciencemag.org/content/7/15/eabe4166).
http://rxnmapper.ai
MIT License
286 stars 68 forks source link

ValueError when processing reaction smiles #20

Closed pgg1610 closed 2 years ago

pgg1610 commented 2 years ago

Hello,

First I appreciate the development and documentation that has gone in making this tool plus the user-friendly web interface developed to make interpretations easier.

Now to the issue:

When I run a list of smiles for generating an atom mapping using a default instance of Rxnmapper I get the following error:

[xx:xx:xx] Explicit valence for atom # 5 N, 4, is greater than permitted

Traceback (most recent call last):
  File "...rxnmapper/lib/python3.6/site-packages/rxnmapper/attention.py", line 58, in __init__
    ">>"
ValueError: '>>' is not in list

During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "----.py", line 30, in <module>
    results = rxn_mapper.get_attention_guided_atom_maps(split_df.rxn.to_list())
  File "--/rxnmapper/lib/python3.6/site-packages/rxnmapper/core.py", line 207, in get_attention_guided_atom_maps
    detailed_output,  # Return attentions when detailed output requested
  File "--/rxnmapper/attention.py", line 64, in __init__
    "rxn smiles is not a complete reaction. Can't find the '>>' to separate the products"
ValueError: rxn smiles is not a complete reaction. Can't find the '>>' to separate the products

Now I do test to check if all the rxn entries have been encoded with a >> so that doesn't seem to be a concern. What I think might be happening is an except with respect to the valency that is halting the code.

Following is a minimal code snippet of the script being run:

rxn_mapper = RXNMapper()

for index, split_df in enumerate(np.array_split(df_rxn, 100)):
    # Split the dataframe for ease of computation                                                                                                                                                                                                                                
    rxn_smiles = []
    atom_maps = []
    confidence = []
    rxn_id = []
    results = rxn_mapper.get_attention_guided_atom_maps(split_df.rxn.to_list())
    rxn_smiles.append(split_df.rxn.to_list())
    atom_maps.append([ entry['mapped_rxn'] for entry in results ])
    confidence.append([ entry['confidence'] for entry in results ])
    rxn_id.append( list(split_df.Reaction_ID) )

    save_dict = {'rxn_smiles':rxn_smiles, 'atom_maps':atom_maps, 'confidence':confidence, 'rxn_id':rxn_id}
    with open('atom_map_trxn_{1}.pkl'.format(index), 'wb') as f:
        pickle.dump(save_dict, f)

    print('Done with rxnmapper split:{}'.format(index))
    del split_df, results

Would like to know steps on resolving this. Thank you.

pschwllr commented 2 years ago

Could you find out which reaction is causing the error and share it? You could for example iterate through your reactions one-by-one until it raises such an error.

[xx:xx:xx] Explicit valence for atom # 5 N, 4, is greater than permitted

Looks like there is an invalid SMILES according to RDKit in one of your reactions.

Have you canonicalised the molecules in the reactions with RDKit before using RXNMapper?

pgg1610 commented 2 years ago

Apologies for my late reply. I sifted through the list of rxn strings and found the culprit with trailing / in it. Thanks for the informative reply. Appreciate it.