tfwillems / HipSTR

Genotype and phase short tandem repeats using Illumina whole-genome sequencing data
GNU General Public License v2.0
94 stars 31 forks source link

bwa mapping of diverged alleles #79

Open yannickwurm opened 3 years ago

yannickwurm commented 3 years ago

Hello,

we're considering using hipstr for some genotyping (of PCR-targeted loci). However, I am concerned that some diverged alleles may not map correctly because bwa may penalise long insertions.

E.g. in the attached screenshot, we see some reads mapping that have an insertion of 16 and another insertion of 2. They have low mapping quality and there aren't that many of them (4 in this screenshot). Based on this mapping alone it is unclear whether they may be contamination / megastutters, or rare mapping reads representing an second, longer allele.

Are there special mapping parameters you recommend to avoid or reduce mapping bias? Or do hipstr's internals use unmapped reads from the bam files to look for potential alternate alleles iteratively?

Thank you in advance for any thoughts & kind regards,

Yannick

Screenshot 2020-10-30 at 15 27 20