rki-mf1 / covsonar

A database-driven system for handling genomic sequences of SARS-CoV-2 and screening genomic profiles.
GNU General Public License v3.0
6 stars 0 forks source link

Optionally mask the start/end of sequences #92

Open matthuska opened 1 year ago

matthuska commented 1 year ago

In GitLab by @hoelzer on Jun 21, 2021, 09:51

It would be great if CovSonar can mask start/end of a sequence (similar to what Nextstrain does, ...), e.g.

The ends are often fuzzy and can lead to false positively called substitutions/indels. If such sites are included in the profiles subsequent tasks such as clustering (breakfast, ...) might fail

EDIT here how Nextstrain does it:

https://github.com/nextstrain/ncov/blob/master/defaults/parameters.yaml

# Mask settings determine how the multiple sequence alignment is masked prior to phylogenetic inference.
mask:
  # Number of bases to mask from the beginning and end of the alignment. These regions of the genome
  # are difficult to sequence accurately.
  mask_from_beginning: 100
  mask_from_end: 50