wdecoster / NanoPlot

Plotting scripts for long read sequencing data
http://nanoplot.bioinf.be
MIT License
401 stars 48 forks source link

minimum read length #358

Closed lucyintheskyzzz closed 4 months ago

lucyintheskyzzz commented 4 months ago

Hi @wdecoster is there a way to get the range of reads min-max in the NanoStats output? I only see these stats (below):

Looks like from the summary file my max read size is 18248, what is the min read length?

Thanks!

General summary: Mean read length: 282.7 Mean read quality: 9.9 Median read length: 249.0 Median read quality: 10.6 Number of reads: 1,518,294.0 Read length N50: 254.0 STDEV read length: 137.7 Total bases: 429,214,286.0 Number, percentage and megabases of reads above quality cutoffs

Q5: 1518292 (100.0%) 429.2Mb Q7: 1516705 (99.9%) 428.8Mb Q10: 994193 (65.5%) 277.6Mb Q12: 294118 (19.4%) 80.5Mb Q15: 12671 (0.8%) 3.7Mb Top 5 highest mean basecall quality scores and their read lengths 1: 38.0 (1) 2: 36.0 (1) 3: 34.0 (1) 4: 34.0 (1) 5: 33.0 (1) Top 5 longest reads and their mean basecall quality score 1: 18248 (11.4) 2: 14380 (10.9) 3: 7936 (9.1) 4: 7587 (9.1) 5: 7557 (9.5)

wdecoster commented 4 months ago

Hi,

I don't consider that a useful metric, why do you think it would be good to know?

Wouter

lucyintheskyzzz commented 4 months ago

Determining the minimum read length can be helpful for virus shot-gun metagenomic sequencing due to the unique characteristics of viral genomes. Viral genomes can vary widely in size, ranging from a few thousand to several hundred thousand base pairs. By establishing a minimum read length, I can ensure that even the smallest viral genomes present in the sample can be accurately sequenced and analyzed. This can also help me change my shot-gun sequencing SOP if I am getting a higher percentage of shorter reads.