quinlan-lab / STRling

Detect novel (and reference) STR expansions from short-read data
MIT License
60 stars 9 forks source link

Full diploid genotypes for 2 short alleles #91

Open seboyden opened 3 years ago

seboyden commented 3 years ago

Current behavior is to report "nan" for allele2 if it is short. It would be nice to have an estimated allele size for both allele1 and allele2, even when both alleles are short.

hdashnow commented 3 years ago

Current thinking about this both for my future reference and to get feedback: Extremely difficult to differentiate between the sizes of two different large alleles, so need to be able to flag this somehow in the output. For shorter alleles, could provide point estimate possibly along with min/max/error bars as well as report no. of reads supporting each allele. Also consider how to provide a quality score based, for example, on number of reads or how difficult to assign supporting reads between two alleles of similar size? Is part of the problem trying to determine if the individual in fact has two short alleles or just appears to because there is a missed larger allele? In terms of practical use, I would imagine this would be mostly about eliminating possible disease loci from consideration if the individual has two small alleles and no large ones.