Open hyanwong opened 1 year ago
Shing says "Castedo suggested using info-theoretic measures (e.g. mutual information) to assess imputation performance"
See https://github.com/hyanwong/100kG-testing/blob/main/notebooks/MismatchTesting.ipynb for some code to use to test this sort of thing
We should add a notebook that uses the unified genealogy to create genotypes, then reads those in to SGkit, then subsets to random samples and sites, adds a random call mask, runs tsinfer, and spits out some imputation metrics.
This isn't ideal, as the unified genealogy has already had a number of calls imputed, which in this case we are taking as a ground truth, but it's a start at addressing the question of what are the best imputation metrics to use to test quality of inference. Hopefully pretty much all the metrics will give the same pattern, but we'll have to see.