nanoporetech / medaka

Sequence correction provided by ONT Research
https://nanoporetech.com
Other
391 stars 73 forks source link

Question about -g option #399

Closed hyunjokoo closed 1 year ago

hyunjokoo commented 1 year ago

Hello,

I remember -g option was introduced from v1.1.2.

With -g option, Medaka breaks contigs as the previous versions.

It is written as follow. -g don't fill gaps in consensus with draft sequence.

Let's say there is a contig. With -g option, this contig was cut to two pieces. In this case, if we do not use -g option, Medaka find the gap sequence from draft sequence and put them together? If Medaka cannot find the gap sequence, does Medaka leave two sequences as it is? or Medaka just leave the original contig as it is without modification?

I hope Medaka find the gap sequence, and if Medaka cannot find the gap, Medaka leave the cut pieces instead of incorrently merged original contig sequence.

Thank you. Hyun Jo Koo

cjw85 commented 1 year ago

I'm not sure what you mean by:

If Medaka cannot find the gap sequence

Medaka will always find the gap sequence in the reference contig: it is by definition present in the reference sequence.

Perhaps the terminology is not clear: what is meant by a "gap" is a contiguous span of the reference sequence which was no analysed by medaka, due to no read coverage in the provided BAM file. When this is detected either the gap is filled with sequence from the reference, or two contigs are output. The -g option controls which of these is performed.