nanoporetech / dorado

Oxford Nanopore's Basecaller
https://nanoporetech.com/
Other
481 stars 59 forks source link

understanding modified bases threshold setting in Dorado #946

Open DepledgeLab opened 1 month ago

DepledgeLab commented 1 month ago

Issue Report

I'm using Dorado v0.7.0 for all context m6A calling but I am uncertain how to interpret the --modified-based-threshold paramater.

--modified-bases-threshold the minimum predicted methylation probability for a modified base to be emitted in an all-context model, [0, 1] [default: 0.05]

Am I correct to interpret this as Dorado will report a site to by m6A modified if the methylation probability for an individual nucleotide in a single read is 5% or higher? Why is this value set so low?

As a further question, is it possible to switch between DRACH and all context using the rna004_130bps_sup@v5.0.0 model or can DRACH only be achieved using rna004_130bps_sup@v3.0.1?

malton-ont commented 1 month ago

Hi @DepledgeLab,

You're not quite right, no. Dorado will emit the probability that a base is modified if it passes this threshold. If it below threshold, dorado is sufficiently confident that the base is not modified that it simply presumes it to be a canonical base and lists it as being skipped in the MM tag. To put it another way, dorado has to be 95% sure a base is unmodified before it will make the decision itself rather than leaving that level of filtering to the user.

There is no DRACH model for v5, no.

DHmeduni commented 1 month ago

Hi, Could you expand on this? I don't quite understand the explanation. Does this mean that the --modified-bases-threshold changes nothing about the base call, but only if the the probability of the call being correct is written to file? Thanks!

malton-ont commented 1 month ago

Hi @DHmeduni,

Yes, exactly.