korem-lab / SGVFinder2

Other
4 stars 1 forks source link

An error resulting to the fail of smp file generation #11

Closed ShawRyu closed 6 months ago

ShawRyu commented 6 months ago

Hi, I met an error while running CLI ICRA. The command is parallel -j 24 " icra \ --outfol $wd/$job_pre.res/01.icra_mapping \ --fq1 $wd/00.CleanData/{1}_reformat_1_kneaddata_paired_1.fastq \ --fq2 $wd/00.CleanData/{1}_reformat_1_kneaddata_paired_2.fastq \ --db /storage2/database/proGenomes1_SV \ --threads 4; rm $wd/$job_pre.res/01.icra_mapping/{1}_reformat_1_kneaddata_paired.pmp; rm $wd/$job_pre.res/01.icra_mapping/{1}_reformat_1_kneaddatapaired.bam " ::: $(ls $wd/00.CleanData | cut -d -f1 | sort -u) &

It worked on most of my samples, but only 10 of them had the same error as below:

Running ICRA single_file command... Running ICRA on paired-end reads! Forward- /storage2/proj.structural_variation//00.CleanData/C541_reformat_1_kneaddata_paired_1.fastq.gz Reverse- /storage2/proj.structural_variation//00.CleanData/C541_reformat_1_kneaddata_paired_2.fastq.gz

this is the len of delta: 11413603 Finished running ICRA, saving results to /storage2/proj.structural_variation//2023_Yunnan.res/01.icra_mapping Running get_sample_map on /storage2/proj.structural_variation//2023_Yunnan.res/01.icra_mapping/C541_reformat_1_kneaddata_paired.jsdel, output will be saved to /storage2/proj.structural_variation//2023_Yunnan.res/01.icra_mapping/C541_reformat_1_kneaddata_paired.smp 2024-03-12 12:52:45,132-INFO: Average read length for C541_reformat_1_kneaddata_paired_1.fastq.gz set to 149 2024-03-12 13:02:24,202-INFO: _initialize: Parameter init complete. Time: 0:09:39.314159 2024-03-12 13:02:52,106-INFO: Iteration 2 - Time: 0:00:27.903469, dPi = 1.78e-01, nPi = 541 2024-03-12 13:03:20,080-INFO: Iteration 3 - Time: 0:00:55.877896, dPi = 8.53e-02, nPi = 536 ... 2024-03-12 13:43:56,025-INFO: Final result - Time: 0:41:31.822526 Traceback (most recent call last): File "/storage1/miniconda3/envs/SGVFinder2/bin/icra", line 8, in sys.exit(run()) File "/storage1/miniconda3/envs/SGVFinder2/lib/python3.10/site-packages/SGVFinder2/cli/icra_cli.py", line 63, in run sample_map = get_sample_map(jsdel_file,args.db+'.dlen',args.x_coverage, args.rate_param) File "/storage1/miniconda3/envs/SGVFinder2/lib/python3.10/site-packages/SGVFinder2/svfinder.py", line 62, in get_sample_map bacid_maps[dest_id][ind2] += used_koef IndexError: index 402 is out of bounds for axis 0 with size 402

This error causes the .smp file not to be generated, but the .jsdel file seemed to be generated correctly.

Thank you for your attention.

ym2877 commented 6 months ago

That's strange that it worked on some samples and not the others. That leads me to believe it might be due to something specific about those files rather than the code itself. Do those files by any chance have a low/zero number of reads?

ShawRyu commented 6 months ago

Hi, I have some questions about create database. How do you build the reference database. I download the representatives.contigs.fasta.gz from https://progenomes1.embl.de/representatives.cgi. Then, I splite it into 5487 files and run the createdb step. How did you build the refrence database. Thank you for your help. We can add contact information to discuss together. My email was 386585845@qq.com

Hi, I met an error while running CLI ICRA. The command is parallel -j 24 " icra --outfol $wd/$job_pre.res/01.icra_mapping --fq1 $wd/00.CleanData/{1}_reformat_1_kneaddata_paired_1.fastq --fq2 $wd/00.CleanData/{1}_reformat_1_kneaddata_paired_2.fastq --db /storage2/database/proGenomes1_SV --threads 4; rm $wd/$job_pre.res/01.icra_mapping/{1}_reformat_1_kneaddata_paired.pmp; rm $wd/$job_pre.res/01.icra_mapping/{1}_reformat_1_kneaddatapaired.bam " ::: $(ls $wd/00.CleanData | cut -d -f1 | sort -u) & It worked on most of my samples, but only 10 of them had the same error as below:

Running ICRA single_file command...

Running ICRA on paired-end reads! Forward- /storage2/proj.structural_variation//00.CleanData/C541_reformat_1_kneaddata_paired_1.fastq.gz Reverse- /storage2/proj.structural_variation//00.CleanData/C541_reformat_1_kneaddata_paired_2.fastq.gz this is the len of delta: 11413603 Finished running ICRA, saving results to /storage2/proj.structural_variation//2023_Yunnan.res/01.icra_mapping Running get_sample_map on /storage2/proj.structural_variation//2023_Yunnan.res/01.icra_mapping/C541_reformat_1_kneaddata_paired.jsdel, output will be saved to /storage2/proj.structural_variation//2023_Yunnan.res/01.icra_mapping/C541_reformat_1_kneaddata_paired.smp 2024-03-12 12:52:45,132-INFO: Average read length for C541_reformat_1_kneaddata_paired_1.fastq.gz set to 149 2024-03-12 13:02:24,202-INFO: _initialize: Parameter init complete. Time: 0:09:39.314159 2024-03-12 13:02:52,106-INFO: Iteration 2 - Time: 0:00:27.903469, dPi = 1.78e-01, nPi = 541 2024-03-12 13:03:20,080-INFO: Iteration 3 - Time: 0:00:55.877896, dPi = 8.53e-02, nPi = 536 ... 2024-03-12 13:43:56,025-INFO: Final result - Time: 0:41:31.822526 Traceback (most recent call last): File "/storage1/miniconda3/envs/SGVFinder2/bin/icra", line 8, in sys.exit(run()) File "/storage1/miniconda3/envs/SGVFinder2/lib/python3.10/site-packages/SGVFinder2/cli/icra_cli.py", line 63, in run sample_map = get_sample_map(jsdel_file,args.db+'.dlen',args.x_coverage, args.rate_param) File "/storage1/miniconda3/envs/SGVFinder2/lib/python3.10/site-packages/SGVFinder2/svfinder.py", line 62, in get_sample_map bacid_maps[dest_id][ind2] += used_koef IndexError: index 402 is out of bounds for axis 0 with size 402 This error causes the .smp file not to be generated, but the .jsdel file seemed to be generated correctly. Thank you for your attention.

Hi, I have some questions about create database. How do you build the reference database. I download the representatives.contigs.fasta.gz from https://progenomes1.embl.de/representatives.cgi. Then, I splite it into 5487 files and run the createdb step. How did you build the refrence database. Thank you for your help. We can add contact information to discuss together. My email was 386585845@qq.com

Hi Jiushao, I didn't meet the problem. And, I think Yoli has explained it quite well in your issue.

talkorem commented 6 months ago

Dear @jiushao12345 , please maintain the separation between issues. If there is a different pending issue you can start a new issue. Thanks.

jiushao12345 commented 6 months ago

@ShawRyu Could you share how you build the refrence database? Thank you. This has been a problem for a long time

jiushao12345 commented 6 months ago

Dear @jiushao12345 , please maintain the separation between issues. If there is a different pending issue you can start a new issue. Thanks.

Thank you for the reminder. I would delete my comment after the end

ShawRyu commented 6 months ago

@jiushao12345 Hi, I am replying to you on your own issue. Hope this will help :)

ShawRyu commented 6 months ago

That's strange that it worked on some samples and not the others. That leads me to believe it might be due to something specific about those files rather than the code itself. Do those files by any chance have a low/zero number of reads?

Hi Yor, you're right. When I used the reformat.sh of bbmap to remove the short or empty reads from the error samples, the SMP files were generated successfully.