nanoporetech / minknow_api

Protobuf and gRPC specifications for the MinKNOW API
Other
50 stars 12 forks source link

How is Q-score calculated #44

Closed mwbudde closed 1 year ago

mwbudde commented 1 year ago

Can you describe how the Q-score is calculated for the minimum Q-score filter and the Q-score plots?

I set the Q-score to 10, but I don't see how that shows up in the data. I would expect the Q-score for the reads to show a sharp cutoff at 10, but the distribution doesn't show a cutoff. I would also like to be able to calculate if the data passes Q20.

fastq_mean = numpy.mean(fastq.letter_annotations["phred_quality"])
fastq_median = numpy.median(fastq.letter_annotations["phred_quality"])
fastq_mode = scipy.stats.mode(fastq.letter_annotations["phred_quality"])[0][0]
image

Thanks!

cjw85 commented 1 year ago

The read Q scores are calculated as in this blog post: https://labs.epi2me.io/quality-scores.

mwbudde commented 1 year ago

This question isn't about how the Q-score is determined from the basecaller. This is a question about how the Q-score for the read is calculated from the fastq data and displayed in minknow.

I can use minknow to filter out reads to only include Q-score of 10 or higher. How would I perform that function without minknow? I would expect that I could just take the mode of the per-base q-score for each read, but that doesn't really match up with the data, as shown above. I would expect that doing so would create a hard cutoff at 10, but we don't see that in the data.

Does the Q-score in the plot have any association with the phred data in the fastq files? If so, what is the relationship?

image
cjw85 commented 1 year ago

See the section entitled "Guppy read Q-scores and read accuracies" in the blog post I linked above.

mwbudde commented 1 year ago

Got it, thanks for your help. It looks like the key is to take the average before log transformation, thanks!