nanoporetech / medaka

Sequence correction provided by ONT Research
https://nanoporetech.com
Other
391 stars 73 forks source link

dorado model dna_r9.4.1_e8_sup@v3.6 support? #474

Closed ChristopherBurgess-USDA closed 6 months ago

ChristopherBurgess-USDA commented 7 months ago

Hello I recently used dorado with dna_r9.4.1_e8_sup@v3.6 model; however, when I go to specify the model in my wf_amplicon pipeline I get an error in medaka below:

ERROR ~ * --basecaller_cfg: dna_r9.4.1_e8_sup@v3.6 is not a valid choice, pick one of:
    - dna_r10.4.1_e8.2_400bps_hac@v4.2.0
    - dna_r10.4.1_e8.2_400bps_sup@v4.2.0
    - dna_r10.4.1_e8.2_260bps_hac@v4.1.0
    - dna_r10.4.1_e8.2_260bps_sup@v4.1.0
    - dna_r10.4.1_e8.2_400bps_hac@v4.1.0
    - dna_r10.4.1_e8.2_400bps_sup@v4.1.0
    - dna_r10.4.1_e8.2_400bps_hac@v3.5.2
    - dna_r10.4.1_e8.2_400bps_sup@v3.5.2
    - dna_r9.4.1_e8_fast@v3.4
    - dna_r9.4.1_e8_hac@v3.3
    - dna_r9.4.1_e8_sup@v3.3
    - dna_r10.4.1_e8.2_400bps_hac_prom
    - dna_r9.4.1_450bps_hac_prom
    - dna_r10.3_450bps_hac
    - dna_r10.3_450bps_hac_prom
    - dna_r10.4.1_e8.2_260bps_hac
    - dna_r10.4.1_e8.2_260bps_hac_prom
    - dna_r10.4.1_e8.2_400bps_hac
    - dna_r9.4.1_450bps_hac
    - dna_r9.4.1_e8.1_hac
    - dna_r9.4.1_e8.1_hac_prom

 -- Check '.nextflow.log' file for details

Are their plans to add the dna_r9.4.1_e8_sup@v3.6 dorado model to medaka or should I use an older model?

cjw85 commented 6 months ago

My colleagues tell me that no medaka models for v3.6 models have been trained. My advice would be to basecall data with a later version of dorado and use the corresponding medaka models.

olawa commented 5 months ago

@cjw85 v3.6 is the latest (last?) model for R9 data. Are there still no plans to release a corresponding medaka model?

cjw85 commented 5 months ago

There are no current plans to release medaka models bespoke to the v3.6 basecaller models.

olawa commented 4 months ago

Hi @cjw85 , what would the best option be for R9.4.1 assemblies then? There is also no model for dorado sup v3.3. Is there a guide somewhere on how to (re)train models for medaka? Or would you recommend just using the latest guppy sup model (507?) with dorado calls? The increase in basecalling accuracy with v3.6 is quite significant so it would be good of course if it could be used with medaka.

olawa commented 3 months ago

In case someone else has the same issue: sup_g507 did indeed improve the assembly from dorado@v3.6 but of course difficulet to know how well it works. I found: https://nanoporetech.github.io/katuali/medaka_train.html but this is from 2020 and is using fast5 and guppy so not sure it will work with dorado?

JWDebler commented 1 week ago

I would also like to see a medaka model for dna_r9.4.1_e8_sup@v3.6 please.