Based on debugging this locus in hg19: chr10:25309168-25309203, a "TATATC" repeat. This locus is problematic because the flanking region is similar to the repeat. Multiple genotypes with short repeats were called, all of which looked incorrect based on capillary electrophoresis data.
classified as enclosing with 4 copies. This was because expansion aware realignment breaks if the start and ends match the flanking region. I don't think that is a good check, so commented it out.
Based on debugging this locus in hg19: chr10:25309168-25309203, a "TATATC" repeat. This locus is problematic because the flanking region is similar to the repeat. Multiple genotypes with short repeats were called, all of which looked incorrect based on capillary electrophoresis data.
Here is the reference region:
Example read misclassified as "enclosing" with 4 copies. Classified as enclosing since both the start and end match the flanking sequence.
I added a check to make sure that reads that start or end in the repeat region cannot be classified as enclosing.
Another example:
classified as enclosing with 4 copies. This was because expansion aware realignment breaks if the start and ends match the flanking region. I don't think that is a good check, so commented it out.