This dataset is different from many others in at least two ways AFAICT now:
The files are extremely heavily compressed (3GB file, 17GB in memory)
A TON of variables!
Together this blows up the memory. I have tested running the recipe with dropping every variable but one and it works fine (still consumes a lot of memory but succeeds fine).
I think at the base the problem here is that a fragment with ~100MB chunksize on a single variable is still extremly large (~2-3GB) and as such the workers try to load a bunch of them eagerly and blow up.
I tried just throwing more RAM at the problem (800GB RAM was not enough!!!), but this dataset is very large in total and I think eventually I would have to be able to load the whole thing into memory, which really is not the point of doing this.
My current suspicion is that for cases like this we might want to consider not only splitting fragments out by dimension indicies, but also splitting across variables? Not at all sure how to achieve this, but wanted to record this as an interesting failcase.
I have been working on refactoring the community bakery at LEAP (#735) and have one interesting problem case here: https://github.com/leap-stc/wavewatch3_feedstock (particularly see the code in https://github.com/leap-stc/wavewatch3_feedstock/pull/1
This dataset is different from many others in at least two ways AFAICT now:
Together this blows up the memory. I have tested running the recipe with dropping every variable but one and it works fine (still consumes a lot of memory but succeeds fine).
I think at the base the problem here is that a fragment with ~100MB chunksize on a single variable is still extremly large (~2-3GB) and as such the workers try to load a bunch of them eagerly and blow up.
I tried just throwing more RAM at the problem (800GB RAM was not enough!!!), but this dataset is very large in total and I think eventually I would have to be able to load the whole thing into memory, which really is not the point of doing this.
My current suspicion is that for cases like this we might want to consider not only splitting fragments out by dimension indicies, but also splitting across variables? Not at all sure how to achieve this, but wanted to record this as an interesting failcase.