related-sciences / ukb-gwas-pipeline-nealelab

Pipeline for reproduction of NealeLab 2018 UKB GWAS
4 stars 3 forks source link

Add support for variant ranges to bgen conversion #10

Closed eric-czech closed 3 years ago

eric-czech commented 4 years ago

The current script supports variant ranges for a worker and could write out zarr archives like ukb_chr{contig}_rng{start}-{stop}.zarr instead of ukb_chr{contig}.zarr. This still needs to be wired up to the Snakemake rule though.

My sense in running operations against the full zarr archives is that it will be impractical for any kind of analysis. For this project specifically, the sample / variant QC filters reduce the dataset by an order of magnitude in size so it will make most sense to write out a more analysis-specific subset (https://github.com/related-sciences/ukb-gwas-pipeline-nealelab/issues/9). For this reason, having zarr archives specific to variant ranges shouldn't be an issue.

eric-czech commented 3 years ago

I'm skipping this. It would be necessary if you wanted to try to rechunk everything quickly but there would be limits to that anyhow since the file downloads are slow and you would have to download up to ~180GB files to parse out small ranges. It is still hypothetically possible to rechunk everything in one day though. It will never be much faster than that.