sdparekh / zUMIs

zUMIs: A fast and flexible pipeline to process RNA sequencing data with UMIs
GNU General Public License v3.0
269 stars 67 forks source link

pigz: skipping: /media/Project1/00_Fastq/finaltest/MergedRead_2.fastq.gz: corrupted -- invalid deflate data (invalid block type) pigz: abort: internal threads error #312

Closed parichitran closed 2 years ago

parichitran commented 2 years ago

Hi christop/Swati Since my celseq with umi incorporated rnaseq data is already demultiplexed with individual cells with its unique cell barcode(8bp) and umi (5bp).So I am concatinating my all paired end reads to get merged forward and reverse reads to run zUMIs.And we already discussed about that here in the below issue Our previous discussion output.txt yamlfile.txt

As per the discussion i ran that for 10 cells alone zUMIs ran well no issues and even tried with 177 cells it ran well .Now when i tried to run for my entire 2684 cells the above issued error only it throws. Since its mentioned corrupted file .I rechecked the files were properly downloaded or not.All were properly downloaded.So there is any issues with the file i guess. Details of read file sizes: File size : read-1=94.7 GB read-2 =235.3 GB.

I am here attaching my yaml file & terminal output

cziegenhain commented 2 years ago

As you say, it really seems like an issue with the read2 input file. you should check if all the individual files were intact, correct gzip format and the concatenation ran fine.

parichitran commented 2 years ago

Thanks christop for your Lightning reply :) ,I will check and get back to you

parichitran commented 2 years ago

Dear christop, The above corrupted file was due to incorrect gzip format of one of the cell's fastq file.So i redownloaded it.And ran the demultiplex for all my cells.This time it ran well upto the filteration step.It has taken all expected cell barcodes.But the problem starts at mapping step .I got the below error EXITING because of fatal error: buffer size for SJ output is too small Solution: increase input parameter --limitOutSJcollapsed Yaml file and terminal output for above error yamlfileandoutputwithSJbuffer.txt

So I increased that --limit out Collapsed parameter to 2000000000 (default=1000000).Now this time no errors regarding SJ output.But mapping is not happening at all and same downstream errors same as before. yaml file and it output with --limit out Collapsed parameter to 2000000000: yamlfileandoutput.txt