nanoporetech / medaka

Sequence correction provided by ONT Research
https://nanoporetech.com
Other
391 stars 73 forks source link

Questions about consensus polishing models for Version 4 basecallers introduced in medaka v1.7.3 #424

Closed ilnamkang closed 8 months ago

ilnamkang commented 1 year ago

Hi,

First note that this issue is for asking questions, not for feature request. I couldn't find how to leave questions in this repository.

I have two questions.

1. What are the names of consensus polishing models for version 4 basecallers? Are the models having names ending with v4.0.0 for version 4 basecallers?

2. What do "Version 4 basecallers" indicate?

Thanks.

cjw85 commented 1 year ago

The names are:

r1041_e82_400bps_hac_v4.0.0
r1041_e82_400bps_sup_v4.0.0

I must admit, what makes the basecaller models version 4 and not version 1, 2, 3, 5, 6, or anything else, I am unsure.

iiSeymour commented 1 year ago

The major version represents the basecaller model neural network architecture and minor is a data/training improvement.

ilnamkang commented 1 year ago

If my nanopore data was obtained from R10.4.1 flow cell and was basecalled using "dna_r10.4.1_e8.2_400bps_sup.cfg" model by Guppy v6.4.6, then is it okay to use "r1041_e82_400bps_sup_g615" model for medaka-cpu v1.7.3?

Or, would "r1041_e82_400bps_sup_v4.0.0" model be better for me?

Kirk3gaard commented 1 year ago

The names are:

r1041_e82_400bps_hac_v4.0.0
r1041_e82_400bps_sup_v4.0.0

I must admit, what makes the basecaller models version 4 and not version 1, 2, 3, 5, 6, or anything else, I am unsure.

No "fast" model coming for this release?

The new extended names are not making things a lot easier yet (#77) I guess most people will be aware about the pore number, possibly also the sequencing speed but the "e82" part is already a bit tricky. I can imagine that the introduction of 5 khz vs 4 khz is going to add another layer of complexity/confusion?

ls47 commented 1 year ago

If my nanopore data was obtained from R10.4.1 flow cell and was basecalled using "dna_r10.4.1_e8.2_400bps_sup.cfg" model by Guppy v6.4.6, then is it okay to use "r1041_e82_400bps_sup_g615" model for medaka-cpu v1.7.3?

Or, would "r1041_e82_400bps_sup_v4.0.0" model be better for me?

I have the exact same setup and question. Also will there be a manual for the different models? The description in the readme is wrong since you need to use e82 (motorprotein) instead of min (MinION).

cjw85 commented 8 months ago

I'm working on a system to allow the correct model to be chosen at runtime by insepcting the input files. These should make the issue here irrelevant for must users. I will close this issue as, in some sense, a duplicate of https://github.com/nanoporetech/medaka/issues/419