Closed alimanfoo closed 2 months ago
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 95.56%. Comparing base (
80820f3
) to head (6a70fc7
). Report is 14 commits behind head on master.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Check out this pull request on
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
I'm finding that the native chunks in the zarr genotype data are too small, which means that dask computations struggle with too many tasks. Increasing the size of chunks for genotype arrays helps with larger computations like SNP allele counts and biallelic diplotypes, which are required for PCA, NJT and other analytical functions.
This PR adds some new convenience values for the
chunks
parameter which activate automatic chunk size selection but only for arrays with more than one dimension. This is necessary because automatic size selection for one-dimensional arrays can lead to high memory usage, particularly when applying a site filter.Also the default value for the
chunks
parameter has changed tondauto0
which I find gives better performance on distributed clusters and has no performance impact either way on colab.Note if using "auto" or any of the "ndauto..." values, the target chunk size is 128MiB by default but can be changed, e.g.: