related-sciences / ukb-gwas-pipeline-nealelab

Pipeline for reproduction of NealeLab 2018 UKB GWAS
4 stars 3 forks source link

Rechunk dosages so there is no chunking in the samples dimension #35

Open tomwhite opened 3 years ago

tomwhite commented 3 years ago

This is one of the key findings from https://github.com/pystatgen/sgkit/issues/390#issuecomment-781503522.

More detail at https://github.com/pystatgen/sgkit/issues/448#issuecomment-780655217. We should rechunk the dosages array (using rechunker), so that chunk sizes are {variant: 64, sample: -1}. (The -1 means no chunking in the samples dimension.)

Note that this should be done as a separate rechunking step, and not as a part of the run_gwas function in gwas.py since the latter will rechunk for each trait group.