Closed Liukvr closed 1 year ago
Hi Luca,
That Q-score is just something the basecaller made up or calculated. It doesn't know the true accuracy of the read. It just thinks, "well, this signal looks pretty decent, so I'll give it a high quality". Based on what you show here, it is not well-calibrated.
Wouter
Hi Wouter, Thanks for the explanation. The plot was generated using ONT reads basecalled using Guippy v6.3.8. From a naive point of view, i did not expect a such number of reads with a poor quality/identity values correlation. From your experience, is this a typical ONT reads identity plot? Did you already faced situation where the Q value revealed to be overestimated by the basecaller? Thanks in advance, Luca
It seems most of your reads are at the expected accuracies, looking at the top histogram.
It would presumably be more informative to convert those empirical percent identities to the Phred scale, and plot the accuracy "according to the basecaller" vs "according to the aligner". Note that also structural variants may affect the reference identity, which is an argument for using a gap-compressed reference identity (https://lh3.github.io/2018/11/25/on-the-definition-of-sequence-identity).
Dear Nanoplot developer, i'm using NanoPlot to assess the percentage read identity of a ONT plant sequencing sequenced using P2 instrument. Looking at the tsv file output from Nanoplot we noticed that there are some reads with an average read quality greater than 20 (e.g. a read identity around 99% would be expected) which identity percentage is far below 99%
Here some exaples:
Resulting in the following plot:
Did you already faced situation like this? If so, how did you explain that? Thanks in advance, Luca