Closed EasternCaveMan closed 9 months ago
Hi Roman, I tried to split my data by method Identity-based double-cold split (I2). but I got this error.
(sail) [vat23@wibi-mickey enzyme_substrate_data]$ ls All_sequences.fasta molecule_data.tsv split_C2 split_R (sail) [vat23@wibi-mickey enzyme_substrate_data]$ datasail --e-type M --e-data molecule_data.tsv --e-sim ecfp --f-type P --f-data All_sequences.fasta --f-sim cdhit --output split_I2 --techniques I2 --splits 0.8 0.2 --names train test --runs 3 --solver SCIP [23:51:30] SMILES Parse Error: syntax error while parsing: ID63558 [23:51:30] SMILES Parse Error: Failed parsing SMILES 'ID63558' for input: 'ID63558' [23:51:30] SMILES Parse Error: syntax error while parsing: ID63559 [23:51:30] SMILES Parse Error: Failed parsing SMILES 'ID63559' for input: 'ID63559' [23:51:30] SMILES Parse Error: syntax error while parsing: ID63560 [23:51:30] SMILES Parse Error: Failed parsing SMILES 'ID63560' for input: 'ID63560' Traceback (most recent call last): File "/home/vat23/miniconda3/envs/sail/bin/datasail", line 11, in <module> sys.exit(sail()) File "/home/vat23/miniconda3/envs/sail/lib/python3.10/site-packages/datasail/sail.py", line 227, in sail datasail_main(**kwargs) File "/home/vat23/miniconda3/envs/sail/lib/python3.10/site-packages/datasail/routine.py", line 58, in datasail_main inter_split_map, e_name_split_map, f_name_split_map, e_cluster_split_map, f_cluster_split_map = run_solver( File "/home/vat23/miniconda3/envs/sail/lib/python3.10/site-packages/datasail/solver/solve.py", line 138, in run_solver inter=set(inter), TypeError: 'NoneType' object is not iterable
input structure for All_sequences.fasta
>ID0 FFEGKNIFVTGGTGLLGKVLVEKILRSTPIGKIYVLVKADDQEAAVDRITKELINSELFRCLKEKHGKYYQAYIRETLIPIVGNICEPNLGMDSDSAHAIMEDVNVIIESAAITTLNERYDVSLEANVNSPQQLMRFAKTCKN >ID1 MDPHNKGVAEAEFFTEYGEASRYEIQEVIGKGSYGIVGSVIDTHTGERVAIKKINDVFEHVSDATRILREIKKADP
input structure for molecule_data.tsv
ids SMILES 0 ID0 NC(=O)C1=CN(C=CC1)[C@@H]1O[C@H](COP(O)(=O)OP(O... 1 ID1 NC1=NC=NC2=C1N=CN2[C@@H]1O[C@H](COP(O)(=O)OP(O... 2 ID2 NC1=NC=NC2=C1N=CN2[C@@H]1O[C@H](COP(O)(=O)OP(O... 3 ID3 NC1=NC=NC2=C1N=CN2[C@@H]1O[C@H](COP(O)(=O)OP(O... 4 ID4 N[C@@H](CCC(=O)N[C@@H](CSCO)C(=O)NCC(O)=O)C(O)=O
I am looking forward to hear from you Best Vahid
That's the same problem as in issue #13 .
Hi Roman, I tried to split my data by method Identity-based double-cold split (I2). but I got this error.
input structure for All_sequences.fasta
input structure for molecule_data.tsv
I am looking forward to hear from you Best Vahid