nanoporetech / dorado

Oxford Nanopore's Basecaller
https://nanoporetech.com/
Other
535 stars 64 forks source link

How to check the modification model version #1130

Closed hungweichen0327 closed 1 week ago

hungweichen0327 commented 1 week ago

Dear community,

I would like to know how to check the modification model version. For example, the current modification model version of dna_r10.4.1_e8.2_400bps_hac@v5.0.0 is V2. But I did not know whether I used the latest version (V2) rather than the previous one (V1). If I used the previous one (V1), did it impact the output read quality?

Thank you for the help.

malton-ont commented 1 week ago

@hungweichen0327,

The modbase model used is stored in the DS tag of the read group (@RG) header. See the docs here.

Using different modbase models will not affect read quality (qscore) but may affect the accuracy of the modbase calls.

HalfPhoton commented 1 week ago

Hi @hungweichen0327, Following on from the documentation on model versions you can check the model version by inspecting the SAM/BAM header with:

samtools view -H calls.bam

# Grep for the model name
samtools view -H calls.bam | grep "hac\S*" -o

The modbase calls will be slightly better with v2 modbase models but the canonical calls will be unchanged.

Best regards, Rich

HalfPhoton commented 1 week ago

Updated documentation FAQ: "Which model did I use?"

hungweichen0327 commented 1 week ago

I would like to confirm the meaning of modbase. modbase = Compatible Modifications = 4mC_5mC 5mCG_5hmCG 5mC_5hmC 6mA, isn't it?

HalfPhoton commented 1 week ago

Modbase is an abbreviation of "modified base". For example, a "modbase model" would be modified base model such as dna_r10.4.1_e8.2_400bps_hac@v5.0.0_5mC_5hmC@v2.0.0.

Your examples are modbase codes, described in the model selection complex fields.

We try to consistently use "modified base" throughout the documentation but "modbase" is used in the codebase as a short-hand.