nanoporetech / dorado

Oxford Nanopore's Basecaller
https://nanoporetech.com/
Other
481 stars 59 forks source link

Dorado and Medaka? #155

Closed tnn111 closed 7 months ago

tnn111 commented 1 year ago

Is there a Dorado model that can be used with Medaka? I can't seem to find a suitable model.

Thanks.

ChristopherRichie commented 1 year ago

I am looking for this too. is there a lookup chart that links basecaller models to medaka models? The naming format between the two systems does not seem to line up, so I am always guessing.

davidnewman02 commented 1 year ago

The current generation of DNA basecalling and consensus models are the v4.2 models: dna_r10.4.1_e8.2_400bps_*@v4.2.0. These should be the default in the latest release of both Dorado and Medaka.

Initial release of v4.2 models was in the following versions:

ChristopherRichie commented 1 year ago

I found a look up table that may address the OP's question. I have found this table in the wf-clone-validation files. It's likely available elsewhere, but I haven't found evidence yet.

(https://github.com/epi2me-labs/wf-clone-validation/blob/97e1eabda31820e923202d9a8eca7cb640f35b57/data/medaka_models.tsv?plain=1#L1-L30)

Depending on your use-case, you may encounter this obstacle: The naming conventions for dorado config and medaka config do not appear to be uniform. dorado: dna_r10.4.1_e8.2_400bps_sup@v3.5.2 matches to medaka: r1041_e82_400bps_sup_g632

therefore i think the medaka model name for what was given by previous reply would have the dots and '@' removed and something look like this: r1041_e82_400bps_hac_v4.2.0

My search for a "lookup table" stems from needing something to feed into the wf-clone-validation parameters.

I found the separate model lists at these links: from the dorado files: https://github.com/nanoporetech/dorado/blob/0598965a56c9ca1c6db856d17ee508edf71ec8de/README.md?plain=1#L141-L172

based on the medaka files: (https://github.com/nanoporetech/medaka/blob/ae8a369729a51e5d515395b4b59a756442af6325/medaka/options.py#L20-L96)

davidnewman02 commented 1 year ago

The model-version now matches between the names in Dorado and Medaka/Clair3, which should make it clearer how to pair the models. In the example below the dorado-model ‘hac@v4.2.0’ pairs with the ‘hac_420` in Medaka/Clair3.

Dorado:   dna_r10.4.1_e8.2_400bps_hac@v4.2.0   
Medaka:   r1041_e82_400bps_hac_v4.2.0_model.tar.gz
Clair3:   r1041_e82_400bps_hac_v420_model
apredeus commented 1 year ago

Which Medaka model should be used for 9.4.1 chemistry? E.g. if I used dna_r9.4.1_e8_sup@v3.6 model in Dorado?

davidnewman02 commented 7 months ago

Hi @apredeus. Please use the r941_prom_sup_g507 medaka model to work with the dna_r9.4.1_e8_sup@v3.6 basecall model.

tweilin commented 7 months ago

hi just wondering what is the medaka model for dorado model of dna_r10.4.1_e8.2_400bps_sup@v4.2.0?

HalfPhoton commented 7 months ago

Hi @tweilin, the models can be viewed in the medaka github page and the correct model is r1041_e82_400bps_sup_v4.2.0_model.tar.gz