Closed rabernat closed 2 years ago
Here is the issue at the algorithm level
from rechunker.algorithm import rechunking_plan
import dask
shape = (175320, 721, 1440)
source_chunks = (24, 721, 1440)
target_chunks = (21915, 103, 10)
itemsize = 4
max_mem = "12GB"
read_chunks, int_chunks, write_chunks = rechunking_plan(
shape,
source_chunks,
target_chunks,
itemsize,
dask.utils.parse_bytes(max_mem),
consolidate_reads=True,
)
print(read_chunks, int_chunks, write_chunks)
read_chunks2, int_chunks2, write_chunks2 = rechunking_plan(
shape,
int_chunks,
target_chunks,
itemsize,
dask.utils.parse_bytes(max_mem),
consolidate_reads=True,
)
print(read_chunks2, int_chunks2, write_chunks2)
(2880, 721, 1440) (2880, 103, 1320) (21915, 103, 1320)
(20160, 103, 1320) (20160, 103, 1320) (21915, 103, 1320)
int_chunks2 should be None.
I started from an array of shape (175320, 721, 1440) and chunks (24, 721, 1440).
I created the following rechunking plan:
Which produced
When executing this plan, the first stage (source to intermediate) ran fine. But the second (intermediate to target) exceeded the memory limits by about 2x or more.
Then I tried just rechunking from the intermediate to the target. I expected this could happen without an additional intermediate. That's how it's supposed to work. But it didn't!
This suggests there may be a bug in our memory allocation logic.