nanoporetech / medaka

Sequence correction provided by ONT Research
https://nanoporetech.com
Other
391 stars 73 forks source link

Model for Guppy 6.1.2 super accurate mode #370

Closed JohnUrban closed 2 years ago

JohnUrban commented 2 years ago

Hello,

I have 5-10X coverage ultra-long reads (N50 120 kb) generated recently using R9 Spot-On flow cell with the MinION.

I basecalled on windows with Guppy GPU basecaller v6.1.2, and assembled with Flye v2.9-b1774 with 1 round of Flye polishing ( I am also trying 3 rounds). Likely owing to the length of the reads, I generated a decent assembly despite the low coverage (1-2 Mb NG50).

I then did further polishing with Medaka v1.6.0 (installed this week with Conda). However, there was no model that exactly matched my situation. The model I chose was r941_min_sup_g507. Will this be ok? Do you plan on releasing a r941_min_sup_g612 model? Or would there be little difference with the g507 model?

Best,

John

p.s. available models were:

r103_fast_g507
r103_hac_g507
r103_min_high_g345
r103_min_high_g360
r103_prom_high_g360
r103_sup_g507
r104_e81_fast_g5015
r104_e81_hac_g5015
r104_e81_sup_g5015
r10_min_high_g303
r10_min_high_g340
r941_min_fast_g303
r941_min_fast_g507
r941_min_hac_g507
r941_min_high_g303
r941_min_high_g330
r941_min_high_g340_rle
r941_min_high_g344
r941_min_high_g351
r941_min_high_g360
r941_min_sup_g507
r941_prom_fast_g303
r941_prom_fast_g507
r941_prom_hac_g507
r941_prom_high_g303
r941_prom_high_g330
r941_prom_high_g344
r941_prom_high_g360
r941_prom_high_g4011
r941_prom_sup_g507
cjw85 commented 2 years ago

I think we should have a refresh of models in the coming days.

dbtara commented 2 years ago

any timelines on when models for guppy 6.1.2 may be available.

dbtara commented 2 years ago

bumping this to see if there is a timeline?

addyblanch commented 2 years ago

I think we should have a refresh of models in the coming days.

Any updates? Its over a month now.

cjw85 commented 2 years ago

Looking at Guppy changelogs and doing md5sums of the model files in Guppy 5.0.7 and 6.1.2 suggests that the R9.4.1 DNA models haven't changed, so the r941_min_sup_g507 model is correct for use with Guppy 6.1.2.

There are new models for R10.4.1 in medaka v1.6.1.

JohnUrban commented 2 years ago

Ouch. That is brutal to find out considering the consensus sequence output by Flye is consistently outperforming further polishing with Medaka in my tests.

cjw85 commented 2 years ago

@JohnUrban is your test data and and metrics available to share? I don't actively work on medaka myself anymore but there are people who would like to examine your data.