nanoporetech / medaka

Sequence correction provided by ONT Research
https://nanoporetech.com
Other
391 stars 73 forks source link

medaka_consensus Model for Guppy 6.4.6 #448

Closed FrostFlow13 closed 11 months ago

FrostFlow13 commented 11 months ago

I'm currently trying to run medaka_consensus on ONT long-read data, and I'm running into some frustrations in trying to figure out which basecalling model to use. I see that there's a (seemingly outdated) guide under "Models" that gives instructions on how to figure out what you should use, but I also see that the newest versions have dropped the nice and clear names based on the Guppy models. Looking around in the issues, I see that this sort of thing has been asked before (by a couple of different users).

To give additional information on my long-read data:

Machine = PromethION 24 Flow cell type = FLO-PRO114M (R10.4.1) Kit type = SQK-NBD114-24 (V14) Basecalling = High-accuracy model, 400 bps MinKNOW = 22.12.5 Bream = 7.4.8 Configuration = 5.4.7 Guppy = 6.4.6 MinKNOW Core = 5.4.3 basecalling_config_filename: "dna_r10.4.1_e8.2_400bps_hac_prom.cfg"

I've also attached the .md and .json (.txt version) report files we were provided in-case those give any additional information needed that I didn't include.

For my specifications, what consensus model should I be using: r1041_e82_400bps_hac_g632, r1041_e82_400bps_hac_v4.0.0, or r1041_e82_400bps_hac_v4.1.0? Or one I didn't list?

I believe that a safe bet for me would be to use the r1041_e82_400bps_hac_g632 option, as it fulfills the "Models" section's suggestion of, "Where a version of Guppy has been used without an exactly corresponding medaka model, the medaka model with the highest version equal to or less than the guppy version should be selected." However, if one of the newer models would be better/are improved over the g632 version and are still compatible with the pieces I'm using, I definitely want to go with one of those!

Additionally, I think it might be helpful to update the "Models" section to include some information on the new naming system for the models.

report_PAO85317_20230615_1429_373a141b.md report_PAO85317_20230615_1429_373a141b.json.txt

ftostevin-ont commented 11 months ago

Yes, r1041_e82_400bps_hac_g632 is the correct model for that Guppy basecaller version.

The medaka and basecaller models are paired, and using a newer medaka model with an older basecaller will probably yield worse results.

FrostFlow13 commented 11 months ago

Thank you - I appreciate the confirmation!