mossmatters / HybPiper

Recovering genes from targeted sequence capture data
GNU General Public License v3.0
108 stars 45 forks source link

Pipeline stops at distribute_reads_to_targets #138

Closed paigethebotanist closed 6 months ago

paigethebotanist commented 6 months ago

Hello,

I have one sample out of ~70 that is not assembling. I don't get errors per se, but HybPiper just stops when it is distributing paired reads to gene directories. Here is what I am seeing:

traceback (most recent call last): File "/home/[user]/miniconda3/envs/hybpiper2/bin/hybpiper", line 10, in sys.exit(main()) ^^^^^^ File "/home/[user]/miniconda3/envs/hybpiper2/lib/python3.11/site-packages/hybpiper/assemble.py", line 1873, in main args.func(args) File "/home/[user]/miniconda3/envs/hybpiper2/lib/python3.11/site-packages/hybpiper/assemble.py", line 1444, in assemble distribute_bwa(bamfile, readfiles, targetfile, target, unpaired_readfile, args.exclude, File "/home/[user]/miniconda3/envs/hybpiper2/lib/python3.11/site-packages/hybpiper/assemble.py", line 738, in distribute_bwa distribute_reads_to_targets.distribute_reads(readfiles, read_hit_dict, merged=merged, single_end=single_end, File "/home/[user]/miniconda3/envs/hybpiper2/lib/python3.11/site-packages/hybpiper/distribute_reads_to_targets.py", line 328, in distribute_reads ID2_long, Seq2, Qual2 = next(iterator2) ^^^^^^^^^^^^^^^ StopIteration

And the last lines of my .log file: 2024-02-20 10:03:21,743 - assemble.py - hybpiper.assemble - assemble - DEBUG - bamfile is: MENPRO895_3.bam 2024-02-20 10:03:21,748 - distribute_reads_to_targets.py - hybpiper.assemble.hybpiper.distribute_reads_to_targets - read_sorting_bwa - INFO - [INFO]: Gathering IDs for mapped reads... 2024-02-20 10:03:22,939 - assemble.py - hybpiper.assemble - distribute_bwa - INFO - [INFO]: In total, 135066 reads from the paired-end read files will be distributed to gene directories 2024-02-20 10:03:22,939 - distribute_reads_to_targets.py - hybpiper.assemble.hybpiper.distribute_reads_to_targets - distribute_reads - DEBUG - Distributing reads from gzipped file MENPRO895_R1.fastq.gz 2024-02-20 10:03:25,129 - distribute_reads_to_targets.py - hybpiper.assemble.hybpiper.distribute_reads_to_targets - distribute_reads - INFO - [NOTE]: Distributing paired reads to gene directories 2024-02-20 10:03:25,130 - distribute_reads_to_targets.py - hybpiper.assemble.hybpiper.distribute_reads_to_targets - distribute_reads - DEBUG - [NOTE]: Distributing reads from gzipped file MENPRO895_R2.fastq.gz

Based on solutions to other similar issues, I have tried:

The funny thing is, I have successfully assembled this sample before, but I need to re-do it. I'm just totally at a loss! Thank you for any advice you can offer. For reference, I am on a home PC with ubuntu and running everything through Miniconda. Let me know what other information you need.

Thank you!

chrisjackson-pellicle commented 6 months ago

Hi @paigethebotanist,

Hmm - is it possible that your read files have been modified since you last successfully assembled this sample? Can you confirm that your read files have the same number of fastq sequences in them? The error suggests that there are fewer sequences in your MENPRO895_R2.fastq.gz file than in your MENPRO895_R1.fastq.gz file.

Cheers,

Chris

paigethebotanist commented 6 months ago

Hi @chrisjackson-pellicle,

Thanks so much for your speedy reply. I don't know how the read files could have been changed, but you're right that the R1 and R2 files had a different number of reads based on size of the files. I re-uploaded the raw files, cleaned them, and re-ran HybPiper with success.

Thanks again!

Paige

chrisjackson-pellicle commented 6 months ago

Glad it was an easy fix!

Cheers,

Chris