malariagen / malariagen-data-python

Analyse MalariaGEN data from Python
https://malariagen.github.io/malariagen-data-python/latest/
MIT License
14 stars 24 forks source link

Handle chunks parameter given as a size in memory; improve default chunks #622

Closed alimanfoo closed 2 months ago

alimanfoo commented 2 months ago

Further work to improve the default configuration of dask chunk sizes, and adding options to control or change chunk sizes if needed. These changes are based on experiences running large computations over genotype data on a distributed cluster and observing performance and memory usage within the cluster.

Also some fixes to correct the GCS buckets to use the new single-region buckets.

review-notebook-app[bot] commented 2 months ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB