wdecoster / NanoPlot

Plotting scripts for long read sequencing data
http://nanoplot.bioinf.be
MIT License
413 stars 47 forks source link

Nanoplot crashed with latest version #271

Closed prasundutta87 closed 2 years ago

prasundutta87 commented 2 years ago

Hi @wdecoster,

Just pasting the crashing error for your reference. I concatenated the fastq.gz files from here- ftp://ftp.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/HG002_NA24385_son/UCSC_Ultralong_OxfordNanoporePromethion/GM24385*.fastq.gz. I am planning to benchmark my SV calling pipeline for ONT data. After concatenation using 'cat' program, I ran Nanoplot and I got this error. I am now running NanoPlot (v1.34.1) on the three individual files now and then will merge the BAM files after alignment. But, I just wanted to know how can I deal with this error.

*2021-09-17 23:37:53,319 NanoPlot 1.34.1 started with arguments Namespace(N50=False, alength=False, bam=None, barcoded=False, color='#4CB391', colormap='Greens', cram=None, downsample=None, dpi=100, drop_outliers=False, fasta=None, fastq=['/home/u027/pdutta/Benchmarking_SVs/raw_data/GM24385.fastq.gz'], fastq_minimal=None, fastq_rich=None, feather=None, font_scale=1, format='png', hide_stats=False, huge=False, info_in_report=False, listcolormaps=False, listcolors=False, loglength=True, maxlength=None, minlength=None, minqual=None, no_N50=False, no_supplementary=False, outdir='QC_before_filtering/GM24385', path='QC_before_filtering/GM24385/', percentqual=False, pickle=None, plots=['dot'], prefix='', raw=False, readtype='1D', runtime_until=None, store=False, summary=None, threads=4, title=None, tsv_stats=True, ubam=None, verbose=False) 2021-09-17 23:37:53,319 Python version is: 3.8.10 (default, May 19 2021, 18:05:58) [GCC 7.3.0] 2021-09-17 23:37:53,320 NanoPlot: valid output format png 2021-09-17 23:37:53,335 Nanoget: Starting to collect statistics from plain fastq file. 2021-09-17 23:37:53,336 Nanoget: Decompressing gzipped fastq /home/u027/pdutta/Benchmarking_SVs/raw_data/GM24385.fastq.gz 2021-09-18 01:33:46,420 Error -3 while decompressing data: invalid literal/length code concurrent.futures.process._RemoteTraceback: """ Traceback (most recent call last): File "/home/u027/project/software/conda/envs/SGP2/lib/python3.8/concurrent/futures/process.py", line 239, in _process_worker r = call_item.fn(call_item.args, *call_item.kwargs) File "/home/u027/project/software/conda/envs/SGP2/lib/python3.8/concurrent/futures/process.py", line 198, in _process_chunk return [fn(args) for args in chunk] File "/home/u027/project/software/conda/envs/SGP2/lib/python3.8/concurrent/futures/process.py", line 198, in return [fn(*args) for args in chunk] File "/home/u027/project/software/conda/envs/SGP2/lib/python3.8/site-packages/nanoget/extraction_functions.py", line 321, in process_fastq_plain data=[res for res in extract_from_fastq(inputfastq) if res], File "/home/u027/project/software/conda/envs/SGP2/lib/python3.8/site-packages/nanoget/extraction_functions.py", line 321, in data=[res for res in extract_from_fastq(inputfastq) if res], File "/home/u027/project/software/conda/envs/SGP2/lib/python3.8/site-packages/nanoget/extraction_functions.py", line 331, in extract_from_fastq for rec in SeqIO.parse(fq, "fastq"): File "/home/u027/project/software/conda/envs/SGP2/lib/python3.8/site-packages/Bio/SeqIO/Interfaces.py", line 73, in next return next(self.records) File "/home/u027/project/software/conda/envs/SGP2/lib/python3.8/site-packages/Bio/SeqIO/QualityIO.py", line 1080, in iterate for title_line, seq_string, quality_string in FastqGeneralIterator(handle): File "/home/u027/project/software/conda/envs/SGP2/lib/python3.8/site-packages/Bio/SeqIO/QualityIO.py", line 956, in FastqGeneralIterator for line in handle: File "/home/u027/project/software/conda/envs/SGP2/lib/python3.8/gzip.py", line 305, in read1 return self._buffer.read1(size) File "/home/u027/project/software/conda/envs/SGP2/lib/python3.8/_compression.py", line 68, in readinto data = self.read(len(byte_view)) File "/home/u027/project/software/conda/envs/SGP2/lib/python3.8/gzip.py", line 487, in read uncompress = self._decompressor.decompress(buf, size) zlib.error: Error -3 while decompressing data: invalid literal/length code """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/u027/project/software/conda/envs/SGP2/lib/python3.8/site-packages/nanoplot/NanoPlot.py", line 59, in main datadf = get_input( File "/home/u027/project/software/conda/envs/SGP2/lib/python3.8/site-packages/nanoget/nanoget.py", line 92, in get_input dfs=[out for out in executor.map(extraction_function, files)], File "/home/u027/project/software/conda/envs/SGP2/lib/python3.8/site-packages/nanoget/nanoget.py", line 92, in dfs=[out for out in executor.map(extraction_function, files)], File "/home/u027/project/software/conda/envs/SGP2/lib/python3.8/concurrent/futures/process.py", line 484, in _chain_from_iterable_of_lists for element in iterable: File "/home/u027/project/software/conda/envs/SGP2/lib/python3.8/concurrent/futures/_base.py", line 619, in result_iterator yield fs.pop().result() File "/home/u027/project/software/conda/envs/SGP2/lib/python3.8/concurrent/futures/_base.py", line 444, in result return self.get_result() File "/home/u027/project/software/conda/envs/SGP2/lib/python3.8/concurrent/futures/_base.py", line 389, in get_result raise self._exception zlib.error: Error -3 while decompressing data: invalid literal/length code**

Regards, Prasun

wdecoster commented 2 years ago

Hi Prasun,

I think this suggests that your fastq file is corrupted.

Wouter

prasundutta87 commented 2 years ago

Hi @wdecoster ,

You may be right. The cat command somehow corrupts the fastq file when concatenation. I tried it two times and got the same error. I just processed the files individually, hoping to concatenate the BAM files downstream, it worked perfectly.

Regards, Prasun

wdecoster commented 2 years ago

Hmmm, cat usually works for me for fastq.gz files. Which command do you use exactly? How did you compress these files?

prasundutta87 commented 2 years ago

I did not compress the files. They were already compressed by GIAB. I just downloaded them using the wget command from the shared link above and used cat *.fastq.gz >