nanoporetech / medaka

Sequence correction provided by ONT Research
https://nanoporetech.com
Other
391 stars 73 forks source link

Use what kinds of reads for medaka consensus? #509

Closed hungweichen0327 closed 2 weeks ago

hungweichen0327 commented 2 weeks ago

Hello,

Thank you for the useful software. I have two questions.

  1. Should I use whole reads (dx:1, dx:0, and dx:-1) for medaka_consensus? Or both duplex and simplex reads without duplex offspring(dx:0 )?

  2. Did I need to filter the reads (like read length > 1kb and quality score Q>10)? Or use the raw reads?

Thank you.

cjw85 commented 2 weeks ago

Medaka is intended for use with simplex data and those determined as "pass" by the sequencer. Read length isn't typically a consideration.

hungweichen0327 commented 2 weeks ago

Thank you for the quick reply. Do you mean using whole simplex data (dx:0 and dx:-1) would be better?

cjw85 commented 2 weeks ago

The error modes of duplex reads are different from simplex, so medaka is likely to misinterpret them. There's a minor secondary issue that medaka makes no particular allowance for dependence between reads (so providing both parts of a duplex read might trip it up).

hungweichen0327 commented 2 weeks ago

Many thanks for the suggestion!