pangeo-data / rechunker

Disk-to-disk chunk transformation for chunked arrays.
https://rechunker.readthedocs.io/
MIT License
163 stars 25 forks source link

Refactor the data model for excutors #38

Closed shoyer closed 4 years ago

shoyer commented 4 years ago

I originally thought that it made the most sense to think of rechunking as a chain of chunked array copies.

After going through the exercise of trying to write an Executor that doesn't use temporary arrays (https://github.com/pangeo-data/rechunker/pull/36), I realize now that this was overly generic. Sometimes it might make sense to implement a rechunking this way, but Rechunker's algorithm adds more structure than that. There is always a "push/pull" structure that splits read chunks into intermediate chunks, and then combines intermediate chunks into write chunks.

This structure wasn't evident from current data model and I needed it for my executor without temporaries, so this PR changes CopySpec to directly keep track of "read/intermediate/write" steps.

I've used this new representation in api.py and the dask executor. It's still convenient sometimes to use the "chain of copies" representation, so I've added the utility function split_into_direct_copies() for converting to this representation in rechunker.executors.util.

codecov[bot] commented 4 years ago

Codecov Report

Merging #38 into master will increase coverage by 0.16%. The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #38      +/-   ##
==========================================
+ Coverage   93.39%   93.55%   +0.16%     
==========================================
  Files           7        7              
  Lines         318      326       +8     
  Branches       65       66       +1     
==========================================
+ Hits          297      305       +8     
  Misses         10       10              
  Partials       11       11              
Impacted Files Coverage Δ
rechunker/api.py 90.75% <100.00%> (+0.23%) :arrow_up:
rechunker/executors/beam.py 100.00% <100.00%> (ø)
rechunker/executors/dask.py 100.00% <100.00%> (ø)
rechunker/executors/python.py 100.00% <100.00%> (ø)
rechunker/executors/util.py 100.00% <100.00%> (ø)
rechunker/types.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 22e6ca0...278e270. Read the comment docs.

TomAugspurger commented 4 years ago

Thanks @shoyer!