Open davidbrochart opened 4 years ago
Merging #28 into master will increase coverage by
0.75%
. The diff coverage is100.00%
.
@@ Coverage Diff @@
## master #28 +/- ##
==========================================
+ Coverage 88.94% 89.70% +0.75%
==========================================
Files 2 2
Lines 190 204 +14
Branches 44 50 +6
==========================================
+ Hits 169 183 +14
Misses 11 11
Partials 10 10
Impacted Files | Coverage Δ | |
---|---|---|
rechunker/api.py | 93.52% <100.00%> (+0.72%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update fdddf0f...4506e97. Read the comment docs.
Thanks a lot for this PR @davidbrochart! I really appreciate your contribution. I will try to give a thorough review in the next few days.
I would wait on further action on this until #30 is merged. That is a pretty significant refactor to the internal structure of the package.
Yes, I agree.
@davidbrochart, now that #30 is done, we might want to revisit this.
Perhaps @shoyer has some ideas about how to best incorporate incremental rechunking / appending into the new code structure.
Again it seems like xarray's lazy indexing adaptors could come in very handy.
@rabernat do you mean rechunker would depend on xarray, or pulling xarray's lazy indexing logic into rechunker's code?
Was there any progress on this since then?
Hi @rsemlal-murmuration - turns out that incremental rechunking is pretty tricky (lots of edge cases)! There hasn't been any work on this recently in rechunker.
However, at Earthmover, we are exploring many different approaches to this problem currently.
Understood! Thanks for the quick reply!
Looking into this as well at the moment.
The workaround we are considering: using rechunker to write the data slice into a new intermediate location, then appending it from there to the existing dataset using xarray.to_zarr(mode="a")
. But it is obviously not the most efficient approach.
Would be interested if there are other approaches/workarounds out there.
This is a limited implementation of incremental rechunking. There is still a lot to do, but I'd like to get early feedback on the approach. Closes #8