wdecoster / NanoPlot

Plotting scripts for long read sequencing data
http://nanoplot.bioinf.be
MIT License
401 stars 48 forks source link

Number of reads from NanoPlot doesn't match samtools #341

Closed KunFang93 closed 9 months ago

KunFang93 commented 9 months ago

Hi,

Thanks for providing this wonderful tool! I am a fresh user for NanoPlot. This might be a silly question but I found that number of reads showed in the my NanoPlot-report.html doesn't match what showed by samtools.

In NanoPlot-report.html, it shows I have 6,973,806 reads, which is correct from my experiments. However, samtools view -@ 30 -c gControl_trim_filt1k.0907.srt.bam gave me 48167056. I wondered how this difference happened? and I also wondered if the NanoPlot report could measure the number mapped reads (mapping rate)?

Thanks in adnvance!

Best, Kun

wdecoster commented 9 months ago

You probably have reads that are unmapped and reads that are mapped to multiple locations (secondary alignments) which are ignored by NanoPlot. You can use -f and -F for samtools view to include or remove reads based on their flag.

KunFang93 commented 9 months ago

Thanks for your prompt reply! However, I still find some difference between NanoPlot and samtools,

I used samtools flagstat -@ 30 gControl_trim_filt1k.0907.srt.bam, which gave me

48167056 + 0 in total (QC-passed reads + QC-failed reads)
5341733 + 0 primary
41035977 + 0 secondary
1789346 + 0 supplementary
0 + 0 duplicates
0 + 0 primary duplicates
48009783 + 0 mapped (99.67% : N/A)
5184460 + 0 primary mapped (97.06% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

If I understand correctly, 5,341,733 here is the number of reads count by samtools? But NanoPlot gave 6,973,806... Please let me know if anything I did wrong. Thanks!

Kun

wdecoster commented 9 months ago

You get 6,973,806 by summing the primary aligned reads (5184460) with the supplementary aligned reads (1789346).

KunFang93 commented 9 months ago

Ohhh, yes, thank you so much!

AsmaaSamyMohamedMahmoud commented 6 months ago

Hi, I'm using fastq file as input and number of reads from NanoStat.txt equals no of "Primary" reads from samtools flagstat. This is not consistent with your previous answer. Could you clarify it?

Thanks,

wdecoster commented 6 months ago

Well it would be very helpful if you would share the numbers you get.

AsmaaSamyMohamedMahmoud commented 6 months ago

this is the number of reads from NanoStat: 4,604,680.0 this is the output from samtools flagstat: 5691407 + 0 in total (QC-passed reads + QC-failed reads) 4604680 + 0 primary 870229 + 0 secondary 216498 + 0 supplementary 0 + 0 duplicates 0 + 0 primary duplicates 3932516 + 0 mapped (69.10% : N/A) 2845789 + 0 primary mapped (61.80% : N/A) 0 + 0 paired in sequencing 0 + 0 read1 0 + 0 read2 0 + 0 properly paired (N/A : N/A) 0 + 0 with itself and mate mapped 0 + 0 singletons (N/A : N/A) 0 + 0 with mate mapped to a different chr 0 + 0 with mate mapped to a different chr (mapQ>=5) Thank you,

wdecoster commented 6 months ago

The original question was about the difference on using NanoPlot on a bam vs samtools flagstat on a bam. You are comparing NanoPlot on the fastq with samtools flagstat on the bam, and therefore, my previous answer doesn't apply.

AsmaaSamyMohamedMahmoud commented 6 months ago

Thank you for clarifying it.