tfwillems / HipSTR

Genotype and phase short tandem repeats using Illumina whole-genome sequencing data
GNU General Public License v2.0
94 stars 31 forks source link

Are there issues with many related samples? #11

Closed holtgrewe closed 8 years ago

holtgrewe commented 8 years ago

I have a cohort with ~75 individuals where I would like to genotype STRs. The data consists mostly of trios and quatros. Thus, the genotypes seen in the data are not independent. Is this a problem for HipSTR?

tfwillems commented 8 years ago

We haven't explicitly tested the affect of including related samples. The only place where I think this may have a negative effect is when we're calculating a noise model for each STR locus. But if your 75 samples consists of trios or quads that are not related between each group, this should a relatively minor issue. For genotyping, I think including the related samples will actually slightly improve genotyping accuracy, as it will make it easier to identify candidate STR haplotypes.