pangeo-forge / pangeo-forge-recipes

Python library for building Pangeo Forge recipes.
https://pangeo-forge.readthedocs.io/
Apache License 2.0
126 stars 54 forks source link

High level problem case: Files with A LOT OF VARIABLES #736

Open jbusecke opened 6 months ago

jbusecke commented 6 months ago

I have been working on refactoring the community bakery at LEAP (#735) and have one interesting problem case here: https://github.com/leap-stc/wavewatch3_feedstock (particularly see the code in https://github.com/leap-stc/wavewatch3_feedstock/pull/1

This dataset is different from many others in at least two ways AFAICT now:

Together this blows up the memory. I have tested running the recipe with dropping every variable but one and it works fine (still consumes a lot of memory but succeeds fine).

I think at the base the problem here is that a fragment with ~100MB chunksize on a single variable is still extremly large (~2-3GB) and as such the workers try to load a bunch of them eagerly and blow up.

I tried just throwing more RAM at the problem (800GB RAM was not enough!!!), but this dataset is very large in total and I think eventually I would have to be able to load the whole thing into memory, which really is not the point of doing this.

My current suspicion is that for cases like this we might want to consider not only splitting fragments out by dimension indicies, but also splitting across variables? Not at all sure how to achieve this, but wanted to record this as an interesting failcase.

moradology commented 4 months ago

"800GB RAM was not enough!!!" 😲