Closed e-fuhrmann closed 1 month ago
Hey, I'm not from nanopore, but I faced this problem when first looking at qscores way back
Mean qscore is the average of the probabilities associated with the quality scores of your read. It is calculated with the following equation:
in python that's
import numpy as np
def calculate_qscore(qstring):
'''
calculate a qscore from a qstring
'''
qs = (np.array(qstring, 'c').view(np.uint8) - 33)
mean_err = np.exp(qs * (-np.log(10) / 10.)).mean()
score = -10 * np.log10(max(mean_err, 1e-4))
return score
If you run this code on your first qstring: &&'.3;>@@;<@25B@DAB910***69,,+,8;?BAB=4BA.'.<>E?DAB<-*),//87557;???<887<243:9:9=>ADACB;;9-*.;?>6/11::<<<A<70'&&&&%
you get 13.47433188066914
Dorado then floors this into an int, so you get 13
I hope that helps James
Hey there,
I understand now, that does the trick!
Thanks a lot!
Issue Report
Please describe the issue:
Hey there, I am currently working on some analyses in which I am particularly interested in the basecalling qscores (qs:f). However, I have a problem with reproducing the mean basecall qscores dorado generates. Whenever I calculate the mean basecall qscore manually for any given read (from bam-column QUAL), the result varies (sometimes drastically) from the qs:f-value dorado outputs. I don't quite understand how dorado arrives at these values.
This is based on my understanding (please correct me if I'm wrong here) that
Steps to reproduce the issue:
mean
mean(qual_num_test)
median
median(qual_num_test)
mode
uniq <- unique(qual_num_test) tab <- tabulate(match(qual_num_test, uniq)) uniq[tab == max(tab)]