wdecoster / NanoPlot

Plotting scripts for long read sequencing data
http://nanoplot.bioinf.be
MIT License
401 stars 48 forks source link

Not printing readID when run on fastq files #368

Closed pclavell closed 1 month ago

pclavell commented 1 month ago

Hello, I have been doing a couple of nanoplot tests as I want to add it as QC in a nanopore pipeline. When I run nanoplot on ubam, I get a file called {my_sample}-data.tsv.gz with readID quals length. However, when I run it with fastq, the readIDs are not reported. I need them because I appended extra information to the readID to be able to stratify qualities by other parameters such as if a read is a duplex or a simplex, and other custom stuff. Do you know what could be happening?

Thanks a lot

wdecoster commented 1 month ago

Hi,

Well, yeh, those readIDs are simply not extracted from a fastq. There is honestly no good reason for that :-) Are you using --fastq or --fastq_rich? Do you only need the tsv file? That is --raw I assume?

Wouter

pclavell commented 1 month ago

I am using --fastq. Yes I only use the tsv from --raw

wdecoster commented 1 month ago

Okay, then I don't have to change NanoPlot to help you. I have added a much trimmed-down version of the code to the scripts: https://github.com/wdecoster/NanoPlot/blob/master/scripts/fastq_to_tsv.py Does that work for you?

Currently, it is single-threaded, as I don't know if you have a ton of data to process, but perhaps this is fast enough.

pclavell commented 1 month ago

Hey, yes it works, thanks a lot! Let's see how does it perform with 15M reads fastq (I have 45 of them)