ornl-oxford / genben

Benchmarking of software frameworks, and systems for storage and compute over large-scale genomic data.
MIT License
2 stars 3 forks source link

Implement a Zarr conversion #23

Closed ebegoli closed 6 years ago

ebegoli commented 6 years ago

Have option to create, as well as to store zarr data artifacts before running the actual analytics benchmark.

For benchmarking we will get the whole of the chromosome 22 data from human 1000 genomes phase 3, but we want to go with converting that file set to Zarr ahead of running the benchmarks.

We have uploaded a pre-built Zarr version of this data to the FTP here:

ftp://ngs.sanger.ac.uk/production/ag1000g/misc/genomics-benchmarks/ALL.chr22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.zarr/

Example code for how to run the VCF to Zarr conversion is in this notebook:
https://github.com/ornl-oxford/genomics-benchmarks/blob/master/notebooks/1000-genomes-vcf-to-zarr.ipynb