Introduces option to convey gaps in consensus sequences via a designated character, e.g. 'N'.
Example
Consider the following two contigs aligned to a reference
reference ACACCGCGGTGTTATA
contigs ACAC GTGC
The default behavior for medaka_consensus is to use information from contigs where available, and fill gaps by copying content from the reference. Alternatively, the -g option splits the consensus into separate pieces.
The PR introduces capability to produce a consensus similarly as in the default mode, but fill gaps with a designated character such as 'N'.
Thanks for the contribution! I'll take care of merging it into our internal repo and getting your commit into the next release (when it will also be pushed to github medaka master).
Addresses #348
Introduces option to convey gaps in consensus sequences via a designated character, e.g. 'N'.
Example
Consider the following two contigs aligned to a reference
The default behavior for
medaka_consensus
is to use information from contigs where available, and fill gaps by copying content from the reference. Alternatively, the-g
option splits the consensus into separate pieces.The PR introduces capability to produce a consensus similarly as in the default mode, but fill gaps with a designated character such as 'N'.
The new scheme conveys what parts of the consensus are based on data, and what parts are based on prior knowledge.
Command line interface
The shell commands to produce the above results are:
The PR also introduces a new option
--fill_char
tomedaka stitch
.