wdecoster / NanoPlot

Plotting scripts for long read sequencing data
http://nanoplot.bioinf.be
MIT License
419 stars 47 forks source link

Difference in read quality output from bam and summary file #342

Closed teodorabu closed 1 year ago

teodorabu commented 1 year ago

Hi, Thanks for this great tool. I have a question regarding the computation of the mean read quality score in the NanoStat.txt file: why is this value different when computed from the basecalling bam file and summary file of the same run?

Thanks!

wdecoster commented 1 year ago

It is computed from the bam file by taking the phred scaled quality scores, converting those to probabilities, taking the mean, and converting that back into the phred scale. When using the summary file the value provided by the basecaller is used directly, without further calculations. It is not disclosed by ONT what else is taken into account to calculate this "per read quality score".

teodorabu commented 1 year ago

thanks for the answer!