gymreklab / GangSTR

A tool for profiling long STRs from short reads
GNU General Public License v2.0
85 stars 16 forks source link

Allowing mismatch in Long str #118

Open DomManou opened 2 years ago

DomManou commented 2 years ago

Hi,

I am trying to use GangSTR to identify a 52bp long STR in several samples. I have dentified my desired STR using the UCSC genome browser and repeatmasker (link: https://genome-euro.ucsc.edu/cgi-bin/hgc?hgsid=286603936_wMmBef9vkWmZKbnIKQ8k97LFAdv3&db=hub_51387_GCA_905237065.2&c=HG993268.2&l=74885285&r=74888665&o=74886109&t=74886738&g=hub_51387_simpleRepeat&i=TGTCTCTCTGACCCAC).

I see that although the STR sequence seems repetitive in the browser, the actual STR motif is not completely identical every time it is repeated. As a result in my output .vcf file 1) the reference allele shows as an identically repeated motif which is not the actual case 2) all sample genotyping for the particular STR is returned as "."

Is there a way to account for possible mismatches within the repeats?

Best regards, Domniki