Closed eric-czech closed 4 years ago
I switched to using ~128 MiB chunks based on dask's normalize_chunks with 'auto' size: https://github.com/related-sciences/ukb-gwas-pipeline-nealelab/commit/c9d3556dce39466f6eec26cad9c9b99a9fa09b97#diff-60222f017a42f5713f6523e0bd78d538R291
I chose to target 512 MiB chunks for the initial zarr conversion from bgen. This is an uncomfortably large block size for default Dask cluster resource recommendations. I am finding that at least 8 GB of RAM is necessary per-core to use these chunkings directly in a GWAS and the standard n1-highmem- ratio is about 6.5 G / vCPU while the n1-standard- ratio is about 3.75 G / vCPU.
For chromosome 21, this results in 5324 571.72 MB float16 chunks w/ shape (10432, 9134, 3) (for 1,261,158 variants). The number of tasks associated with building these chunks is about 74,537. This then means that the number of expected tasks across all chromosomes is something like 74537(97M/1.261158M) = 5.73M. According to this, if we assume about a millisecond scheduling latency per task then we could expect that our chunking induces 5.73M .001 = 5730 seconds = 1.6 hours.
That much latency in the context of the whole operation isn't much, so it could likely be doubled or quadrupled without much impact.
tl;dr We should switch to 128 MiB chunks instead of 512 MiB chunks.