szhan / onekg_analysis

Evaluation of genotype imputation methods using the unified genealogy dataset
MIT License
0 stars 0 forks source link

Split `prepare_dataset.ipynb` into separate notebooks #4

Closed szhan closed 1 year ago

szhan commented 1 year ago

Moved from https://github.com/szhan/tsimpute/issues/93

Right now, this one notebook does the following:

It is easier to divide them up into the following stages, one per notebook:

szhan commented 1 year ago

Splitting individuals into reference panel and target cohort should probably be done in a separate notebook, so as to avoid processing the original unified genealogies again.

Also, running BEAGLE should be done in a separate bash script. No need for a Jupyter notebook.

So, the notebook collection should be:

szhan commented 1 year ago

There should be a separate notebook to compare the imputed genotypes from BEAGLE and tskit.lshmm with the true genotypes.

So, there are five notebooks in total:

szhan commented 1 year ago

The notebooks prepare_dataset_*.ipynb are complete, so this issue is done for now. Completing the other two notebooks will address #1.

szhan commented 1 year ago

I think it is useful to add a separate notebook (prepare_dataset_4.ipynb) just for making compatible genotypes from VCF files in sgkit for easier downstream analysis.