luntergroup / octopus

Bayesian haplotype-based mutation calling
MIT License
299 stars 37 forks source link

mask-soft-clipped-bases option and local realignment #190

Closed delafoy closed 3 years ago

delafoy commented 3 years ago

Hi there,

I have an issue with Octopus (0.7.4). I processed amplicon samples using bwa and samtools ampliconclip for soft-clipping of primers. I use --mask-soft-clipped-bases for calling but it seems that when octopus realigns reads with soft-clipped primer inside, the soft-clipping information is lost and base quality is not modified either, resulting in bias of Allele Frequencies. When there is no realignment, the soft-clipping is preserved. As a workaroud, we hard-clipp primers with samtools ampliconclip.

I hope it can be useful.

dancooke commented 3 years ago

Hi, when local assembly candidate discovery is enabled, you need to set both --mask-soft-clipped-bases and --soft-clip-mask-threshold to unconditionally mask soft-clipped bases (e.g. --mask-soft-clipped-bases --soft-clip-mask-threshold 60.

Note that hard-clipping primer sequences prior to variant calling is absolutely the right thing to do. You generally don't want to mask soft-clipped sequence since it can contain real variation that's just been misaligned by the mapper - this can often be recovered by assembly.