szhan / onekg_analysis

Evaluation of genotype imputation methods using the unified genealogy dataset
MIT License
0 stars 0 forks source link

Replace use of `tsinfer.SampleData` with `sgkit` #2

Closed szhan closed 1 year ago

szhan commented 1 year ago

This should greatly facilitate all sorts of analyses of the genotype data.

szhan commented 1 year ago

Being addressed in #1

szhan commented 1 year ago

For now, the quickest way to take advantage of sgkit is to get everything into VCF files before comparisons.

For example,

beagle_vcf_file = "../analysis/beagle/target.beagle.vcf.gz"
beagle_zarr_file = "target.beagle.zarr"
vcf_to_zarr(beagle_vcf_file, beagle_zarr_file)
beagle_ds = sg.load_dataset(beagle_zarr_file)
beagle_ds
szhan commented 1 year ago

Addressed in a separate repo (see tsimpute issue 99).