Question regarding the new dna_r10.4.1_e8.2_400bps_sup@v5.0.0 Model

Hello @AzlanNI,

When you use the dna_r10.4.1_e8.2_400bps_sup@v5.0.0_4mC_5mC modified base model, every sequencing read C will have an associated probability of 5mC, 4mC and canonical ($1-p{\text{5mC}} - p{\text{4mC}}$). Then when you use modkit pileup (or modkit extract with --read-calls) these probabilities are converted into base modification "calls" (i.e. classifications) based on the filtering algorithm, which may seem complicated but under most circumstances just picks the modification state with the highest probability, or filters our that site if the probability isn't high enough because the model isn't confident in the prediction. When you use modkit pileup the resulting table will count the number of reads that had each modification state at each genomic position, the schema for the table is in the modkit documentation and is nicely compatible with most genome viewers. I hope this answers your question, please let me know if it doesn't.

nanoporetech / dorado

Question regarding the new dna_r10.4.1_e8.2_400bps_sup@v5.0.0 Model #882