otori-bird / retrosynthesis

MIT License
58 stars 14 forks source link

generate_RtoP_data.py crashes #6

Closed masun closed 1 year ago

masun commented 1 year ago

Dears, I run the the script generate_PtoR.py with my data, everything is ok, but when I run the generate_RtoP_data.py script with the same data, it crushes in this line: map_number = reactant_candidates[0][0]

I have printed out the reactant_candidates values and it is very strange that even this list is empty, it does not crash, but other times, yes it crashes and yields the below message. I have got the same problem using either Linux or Windows. Please, help. Thank you, best ######################################################## multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/miniconda3/miniconda3/lib/python3.8/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, *kwds)) File "/miniconda3/miniconda3/lib/python3.8/multiprocessing/pool.py", line 48, in mapstar return list(map(args)) File "preprocessing/generate_RtoP_data.py", line 453, in multi_process map_number = reactant_candidates[0][0] IndexError: list index out of range """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "preprocessing/generate_RtoP_data.py", line 611, in src_data, tgt_data = preprocess( File "preprocessing/generate_RtoP_data.py", line 315, in preprocess results = pool.map(func=multi_process,iterable=data) File "/gpfs1/data/chembiom/tools/miniconda3/miniconda3/lib/python3.8/multiprocessing/pool.py", line 364, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/gpfs1/data/chembiom/tools/miniconda3/miniconda3/lib/python3.8/multiprocessing/pool.py", line 771, in get raise self._value IndexError: list index out of range

otori-bird commented 1 year ago

I suspect that the error is caused by the failure of extracting reactants that share the same atom mapping number with the product. could you share some failure examples?

masun commented 1 year ago

Dir Otori, Here I am sending the reactions taken from KEGG. Please note that the reactant_candidates list for the three of them is empty, but the script does not crash always. Thank you for your help. Best

reaction_ID,reactants>reagents>production R10954,[CH3:1][O:2]C:3[c:5]1c:6[cH:8][c:9]2[cH:10][c:11]3c:12C:18[c:20]1c:22[cH:24]c:25[cH:28][c:30]1[C:32]3=[O:33].[NH2:34]C:35[C:37]1=[CH:81]N:41C@H:69[C@@H:75]3[OH:76])C@@H:77[C@H:79]2[OH:80])[CH:40]=[CH:39][CH2:38]1.[OH2:31].[O:23]=[O:29].[H+]>>[CH3:1][O:2]C:3[c:5]1c:6[cH:8][c:9]2[cH:10][c:11]3c:12C:18[C@:20]1([OH:21])C:22[CH:24]=C:25C@@H:28[C@:30]1([OH:31])[C:32]3=[O:33].[NH2:34]C:35[c:37]1[cH:38][cH:39][cH:40]n+:41C@H:69[C@@H:75]3[OH:76])C@@H:77[C@H:79]2[OH:80])[cH:81]1.[OH2:82] R09281,[NH2:1]C:2[c:4]1[cH:5]n+:6C@H:34[C@@H:40]3[OH:41])C@@H:42[C@H:44]2[OH:45])[cH:46][cH:47][cH:48]1.[OH:49][CH2:50][CH2:51][CH2:52]C:53[OH:55]>>[NH2:1]C:2[C:4]1=[CH:5]N:6C@H:34[C@@H:40]3[OH:41])C@@H:42[C@H:44]2[OH:45])[CH:46]=[CH:47][CH2:48]1.[O:49]=[CH:50][CH2:51][CH2:52]C:53[OH:55].[H+:56] R09289,[NH2:1]C:2[c:4]1[cH:5]n+:6C@H:34[C@@H:40]3[OH:41])C@@H:42[C@H:44]2[OH:45])[cH:46][cH:47][cH:48]1.[OH:49][CH2:50][CH2:51]C:52[OH:54]>>[NH2:1]C:2[C:4]1=[CH:5]N:6C@H:34[C@@H:40]3[OH:41])C@@H:42[C@H:44]2[OH:45])[CH:46]=[CH:47][CH2:48]1.[O:49]=[CH:50][CH2:51]C:52[OH:54].[H+:55]

otori-bird commented 1 year ago

Dir Otori, Here I am sending the reactions taken from KEGG. Please note that the reactant_candidates list for the three of them is empty, but the script does not crash always. Thank you for your help. Best

reaction_ID,reactants>reagents>production R10954,[CH3:1][O:2]C:3[c:5]1c:6[cH:8][c:9]2[cH:10][c:11]3c:12C:18[c:20]1c:22[cH:24]c:25[cH:28][c:30]1[C:32]3=[O:33].[NH2:34]C:35[C:37]1=[CH:81]N:41[CH:40]=[CH:39][CH2:38]1.[OH2:31].[O:23]=[O:29].[H+]>>[CH3:1][O:2]C:3[c:5]1c:6[cH:8][c:9]2[cH:10][c:11]3c:12C:18[C@:20]1([OH:21])C:22[CH:24]=C:25C@@h:28[C@:30]1([OH:31])[C:32]3=[O:33].[NH2:34]C:35[c:37]1[cH:38][cH:39][cH:40]n+:41[cH:81]1.[OH2:82] R09281,[NH2:1]C:2[c:4]1[cH:5]n+:6[cH:46][cH:47][cH:48]1.[OH:49][CH2:50][CH2:51][CH2:52]C:53[OH:55]>>[NH2:1]C:2[C:4]1=[CH:5]N:6[CH:46]=[CH:47][CH2:48]1.[O:49]=[CH:50][CH2:51][CH2:52]C:53[OH:55].[H+:56] R09289,[NH2:1]C:2[c:4]1[cH:5]n+:6[cH:46][cH:47][cH:48]1.[OH:49][CH2:50][CH2:51]C:52[OH:54]>>[NH2:1]C:2[C:4]1=[CH:5]N:6[CH:46]=[CH:47][CH2:48]1.[O:49]=[CH:50][CH2:51]C:52[OH:54].[H+:55]

Thanks for you examples, but the rdkit seems to be unable to recognize the reaction SMILES you provided.

I ran the script below and got some rdkit errors:

from rdkit import Chem
rxn = "[NH2:1]C:2[c:4]1[cH:5]n+:6[cH:46][cH:47][cH:48]1.[OH:49][CH2:50][CH2:51]C:52[OH:54]>>[NH2:1]C:2[C:4]1=[CH:5]N:6[CH:46]=[CH:47][CH2:48]1.[O:49]=[CH:50][CH2:51]C:52[OH:54].[H+:55]"
r, p = rxn.split(">>")
pmol = Chem.MolFromSmiles(p)

The error message was: SMILES Parse Error: unclosed ring for input: '[NH2:1]C:2[C:4]1=[CH:5]N:6[CH:46]=[CH:47][CH2:48]1.[O:49]=[CH:50][CH2:51]C:52[OH:54].[H+:55]

Could you please check the validity of these examples or provide me with other examples?

masun commented 1 year ago

Dear Otori, Yes you are right. Some SMILEs became invalid after applying RXNMapper tool. It is recommended to use Indigo. Thank you very much