Closed tomwhite closed 4 years ago
Yes, in fact, this was discussed in #4! However, for the first release, we opted for only supporting zarr arrays / groups.
The big question for dask arrays is whether to try to perform "read consolidation" on the input by calling .rechunk
on the dask array. @TomAugspurger suggested that this might not work optimally in all cases. There is an option to use consolidate_reads=False
in rechunking_plan
--we might want to enable this for dask inputs.
We would love a PR if you wanted to implement this.
Thanks @rabernat. I will see if I can write a PR for this.
https://github.com/dask/dask/issues/6272 is the (potential) issue with rechunking the arrays. I haven't verified that it's actually an issue in practice though.
Fixed in #27
Thank you for all your work on this library! In testing it I have found it useful to create random persistent Zarr arrays with different chunk sizes. It struck me that this is just an application of
rechunk
, where the input is a Dask array rather than a Zarr array.For example if I want to create an array of size
(4000, 4000)
, with 10 chunks of size(400, 4000)
, using 5 tasks, where each task creates 2 chunks (by making the source chunks(800, 4000)
), then it would be nice to do this with something likeThe read and write chunk sizes would be the same in this case, so it's a simple pass through with no intermediate store.
Would it be possible to make
rechunk
cater for this by accepting a Dask array?