Open szhan opened 2 years ago
This is so simple that maybe what we want is a just method to return the dosage
instead? So, we'd do something like
site_sq_corr = []
for variant1, variant2 in zip(ts_true.variants(), ts_imputed.variants()):
sq_corr = np.corrcoef(variant1.dosage(), variant2.dosage())**2
site_sq_corr.append(sq_corr)
Another metric is the squared correlation, which is simply the square of the Pearson correlation coefficient between the allele dosage of the true genotypes and the allele dosage of the imputed genotypes. In a diploid genome, the AD of 0|0 is 0; 0|1 and 1|0 is 1; 1|1 is 2. SR is pertinent to GWAS, because it has been shown that higher mean SR across sites can mean higher power to discover variants associated with a trait or disease. Linking #2193.
A function in the Variant class that allows us to get SR site by site would be good.