wdecoster / NanoPlot

Plotting scripts for long read sequencing data
http://nanoplot.bioinf.be
MIT License
419 stars 47 forks source link

ValueError: Invalid character in quality string #340

Closed Sreelekshmi-291 closed 8 months ago

Sreelekshmi-291 commented 1 year ago

I am running NanoPlot on 3 fastq files - the original fastq, fastq with its simplex reads and fastq with its duplex reads. The Nanoplot crashes with the error "ValueError: Invalid character in quality string" for the original fastq but runs perfectly fine for simplex and duplex ones. I checked the unaligned bam to fastq conversion step, to see if the generated fastq is truncated, but that log file did not show any error. Also, the source bam used to generate everything is intact and not truncated. Adding the Nanoplot log file- 2023-09-06 12:33:41,910 Python version is: 3.10.11 | packaged by conda-forge | (main, May 10 2023, 18:58:44) [GCC 11.3.0] 2023-09-06 12:33:42,161 Nanoget: Starting to collect statistics from plain fastq file. 2023-09-06 12:33:42,162 Nanoget: Decompressing gzipped fastq /home/sofia/Mala_Quartet/BNG69/BNG69_pass.fastq.gz 2023-09-06 13:12:21,982 Invalid character in quality string concurrent.futures.process._RemoteTraceback: """ Traceback (most recent call last): File "/home/sofia/.conda/envs/nanoplot/lib/python3.10/concurrent/futures/process.py", line 246, in _process_worker r = call_item.fn(*call_item.args, *call_item.kwargs) File "/home/sofia/.conda/envs/nanoplot/lib/python3.10/concurrent/futures/process.py", line 205, in _process_chunk return [fn(args) for args in chunk] File "/home/sofia/.conda/envs/nanoplot/lib/python3.10/concurrent/futures/process.py", line 205, in return [fn(*args) for args in chunk] File "/home/sofia/.conda/envs/nanoplot/lib/python3.10/site-packages/nanoget/extraction_functions.py", line 396, in process_fastq_plain data=[res for res in extract_from_fastq(inputfastq) if res], File "/home/sofia/.conda/envs/nanoplot/lib/python3.10/site-packages/nanoget/extraction_functions.py", line 396, in data=[res for res in extract_from_fastq(inputfastq) if res], File "/home/sofia/.conda/envs/nanoplot/lib/python3.10/site-packages/nanoget/extraction_functions.py", line 407, in extract_from_fastq for rec in SeqIO.parse(fq, "fastq"): File "/home/sofia/.conda/envs/nanoplot/lib/python3.10/site-packages/Bio/SeqIO/Interfaces.py", line 72, in next return next(self.records) File "/home/sofia/.conda/envs/nanoplot/lib/python3.10/site-packages/Bio/SeqIO/QualityIO.py", line 1134, in iterate raise ValueError("Invalid character in quality string") from None ValueError: Invalid character in quality string """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/sofia/.conda/envs/nanoplot/lib/python3.10/site-packages/nanoplot/NanoPlot.py", line 61, in main datadf = get_input( File "/home/sofia/.conda/envs/nanoplot/lib/python3.10/site-packages/nanoget/nanoget.py", line 110, in get_input dfs=[out for out in executor.map(extraction_function, files)], File "/home/sofia/.conda/envs/nanoplot/lib/python3.10/site-packages/nanoget/nanoget.py", line 110, in dfs=[out for out in executor.map(extraction_function, files)], File "/home/sofia/.conda/envs/nanoplot/lib/python3.10/concurrent/futures/process.py", line 575, in _chain_from_iterable_of_lists for element in iterable: File "/home/sofia/.conda/envs/nanoplot/lib/python3.10/concurrent/futures/_base.py", line 621, in result_iterator yield _result_or_cancel(fs.pop()) File "/home/sofia/.conda/envs/nanoplot/lib/python3.10/concurrent/futures/_base.py", line 319, in _result_or_cancel return fut.result(timeout) File "/home/sofia/.conda/envs/nanoplot/lib/python3.10/concurrent/futures/_base.py", line 458, in result return self.get_result() File "/home/sofia/.conda/envs/nanoplot/lib/python3.10/concurrent/futures/_base.py", line 403, in get_result raise self._exception ValueError: Invalid character in quality string

wdecoster commented 1 year ago

Hi,

That error is raised by Biopython, which NanoPlot uses for parsing the fastq file. It is quite lenient regarding fastq formatting, but it doesn't seem to like your file :)

Do you think you could share it?

Wouter