nanoporetech / medaka

Sequence correction provided by ONT Research
https://nanoporetech.com
Other
402 stars 74 forks source link

are multiple rounds of polishing with medaka recommended? #514

Closed owenwilkins closed 2 months ago

owenwilkins commented 2 months ago

I am currently performing polishing of a large eukaryotic genome (mouse) assembled with flye.

as noted in the documentation, I understand that the latest version of medaka is intended to be used directly on the output of the flye, rather than after one or several rounds of polishing with racon, however it is not discussed if multiple rounds of medaka are ever recommended or required to create a high quality final consensus sequence.

after 1 round of medaka, I am still seeing some features in the assembly that look like common assembly errors from unpolished sequences, which is what stimulated my question.

Thanks in advance,

cjw85 commented 2 months ago

No.

owenwilkins commented 2 months ago

ok thanks very much for your response. could you possibly comment on how medaka is able to achieve results in a single round where other approaches may need multiple?

cjw85 commented 2 months ago

The inference models in medaka are trained to examine alignments of reads to a scaffolding sequence in order to predict a true sequence. They are therefore conditioned on the error with respect to the true sequence of both the scaffolding sequence and the read sequences (and so implicitly the relative error between the scaffolding sequence and the reads).

After application of medaka the scaffolding sequence will have changed, and so the relative errors also. The models are then not appropriate for further correction.

owenwilkins commented 2 months ago

ok thanks, thats useful to know