tfwillems / HipSTR

Genotype and phase short tandem repeats using Illumina whole-genome sequencing data
GNU General Public License v2.0
94 stars 31 forks source link

problem with FLANK_ASSEMBLY_CYCLIC #54

Closed lvf1990 closed 6 years ago

lvf1990 commented 6 years ago

Our STR locus were filtered by 'FLANK_ASSEMBLY_CYCLIC' in 20% of our samples. Do you have any advise about these locus. Can we just increase kmer in your script?

tfwillems commented 6 years ago

Hi @lvf1990,

This issue typically occurs if there's another repeat near your STR of interest.

As you suggested, one option would be to increase the value of MAX_KMER in src/seq_stutter_genotyper.h from 15 to a higher value and then recompile HipSTR.

Another option would be to adjust the coordinates of the region you're inputting into HipSTR. For example, if your main STR is a GATA repeat and there's another ATCG repeat 5 bp upstream of that in the flanking sequence, you could modify the repeat coordinates such that it captures both repeats (i.e. START = left position of ATCG, END = right position of GATA). I'd only recommend this if the main STR and the flanking repeat have the same repeat motif length (i.e. don't do this if you there's a ATATATAT... repeat near your GATA repeat)

If you'd be willing to send me a BAM file and the repeat you're interested in (hipstrtool at gmail), I'd be happy to take a closer look

Best, Thomas