nanoporetech / dorado

Oxford Nanopore's Basecaller
https://nanoporetech.com/
Other
522 stars 62 forks source link

differences between 5mCG_5hmCG and 5mC_5hmC #440

Closed Puputnik closed 1 year ago

Puputnik commented 1 year ago

Hi,

I was playing around with different modification models on a DNA dataset. I tested both 5mCG_5hmCG@V3 and 5mC_5hmC@V1 models (with sup@v4.2.0 as basecalling model). I wonder which are the differences of using the two models because i actually got identical results from the two tests (same exact MM and ML tags from reads with the same qname). I was under the impression that 5mC_5hmC investigates all Cs while 5mCG_5hmCG only CGs, but now i'm not sure anymore.

Can you please clarify?

Thanks a lot!

ArtRand commented 1 year ago

Hello @Puputnik,

Thanks for bringing this up! The 5hmCG_5mCG model should only make calls at CpG dinucleotides. We're going to advance a patch release to fix this configuration problem.

tijyojwad commented 1 year ago

Hi @Puputnik we just released a patched version of dorado with the correct model. New dorado can be downloaded from https://github.com/nanoporetech/dorado#installation . The fixed model version for 5mCG_5hmCG is v3.1. If you use the --modified-bases option the correct model will be automatically downloaded.