pangeo-data / rechunker

Disk-to-disk chunk transformation for chunked arrays.
https://rechunker.readthedocs.io/
MIT License
163 stars 25 forks source link

Avoid unnecessary copy to intermediate store #150

Closed ghiggi closed 7 months ago

ghiggi commented 7 months ago

Hi @rabernat !

I guess I find a case where rechunker unnecessary create copies to the intermediate store. It's a small fix but it avoid wasting computations when the rechunking can be done without intermediate copies to disk.

Here below a reproducible example of when he could occur:

import numpy as np
import dask.utils 
from rechunker.algorithm import rechunking_plan

dtype = np.dtypes.Float32DType()
itemsize = dtype.itemsize

shape = (1000, 2000, 2000)
source_chunks = (1, 2000, 2000) 
target_chunks = (1000, 4, 4)

max_mem = dask.utils.parse_bytes("20 GB")

read_chunks, int_chunks, write_chunks = rechunking_plan(
    shape=shape,
    source_chunks=source_chunks,
    target_chunks=target_chunks,
    itemsize=itemsize,
    max_mem=max_mem,
    consolidate_reads=False,
)

print(read_chunks)
print(int_chunks)
print(write_chunks)
rabernat commented 7 months ago

Thanks for this! Please add a test.

codecov[bot] commented 7 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (ba7efc0) 70.03% compared to head (bddeb19) 96.12%. Report is 1 commits behind head on master.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #150 +/- ## =========================================== + Coverage 70.03% 96.12% +26.09% =========================================== Files 11 11 Lines 554 568 +14 Branches 106 113 +7 =========================================== + Hits 388 546 +158 + Misses 149 14 -135 + Partials 17 8 -9 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

ghiggi commented 7 months ago

@rabernat done ;)

rabernat commented 7 months ago

Great thank you. I don't know why the doc build is failing. Everything else looks good.

ghiggi commented 4 months ago

Hey @rabernat. Sorry to bother you. Would be possible to make a new package release with this PR included? Let me know if I can help in some way ;)