tfwillems / HipSTR

Genotype and phase short tandem repeats using Illumina whole-genome sequencing data
GNU General Public License v2.0
94 stars 31 forks source link

phasing results #77

Closed zhangguy closed 4 years ago

zhangguy commented 4 years ago

Hi,

Thanks for developing this amazing tool.

If I run in the physical phasing mode, which is giving HipSTR a phased SNP vcf file by --snp-vcf, can I assume that, for a particular sample at a particular STR locus, when PQ > 0.9, the order of STR alleles are the same as the order of a nearby SNP alleles? For instance, PQ > 0.9, STR is "ATGCATGCATGC|ATGC", a nearby SNP (250 bp away) is "A|C" in the input SNP file, can I say that the two haplotypes are "ATGCATGCATGC-----A" and "ATGC-----C"? If not, is there a way to get the haplotypes?

Thanks

tfwillems commented 4 years ago

Yes, that’s exactly right! The phased STR genotypes are reported such that they’re consistent with the input SNP VCF file ordering. So in theory the 1st allele for all well phased SNP calls should lie on the same dna segment as the 1st allele from the phased STR call (and analogously for the 2nd allele) Best, Thomas