Calculate squared correlation to assess imputation performance

Another metric is the squared correlation, which is simply the square of the Pearson correlation coefficient between the allele dosage of the true genotypes and the allele dosage of the imputed genotypes. In a diploid genome, the AD of 0|0 is 0; 0|1 and 1|0 is 1; 1|1 is 2. SR is pertinent to GWAS, because it has been shown that higher mean SR across sites can mean higher power to discover variants associated with a trait or disease. Linking #2193.

A function in the Variant class that allows us to get SR site by site would be good.

site_sq_corr = []
for variant1, variant2 in zip(ts_true.variants(), ts_imputed.variants()):
  sq_corr = variant1.squared_correlation(variant2)
  site_sq_corr.append(sq_corr)

tskit-dev / tskit

Calculate squared correlation to assess imputation performance #2200