These could facilitate directly opening data from Zarr using idiomatic patterns in Xarray-Beam (e.g., using Xarray's lazy indexing machinery instead of dask).
I'm imaging open_zarr() returning a tuple of values transform, template, chunks providing exactly the information needed to use the dataset in a Zarr-to-Zarr pipeline:
transform would be the beam PTransform that could be used in a pipeline (equivalent to the result of xbeam.ZarrToChunks()).
template itself would be an efficient lazy xarray.Dataset consisting of a single dask chunk, e.g., equivalent to xarray.zeros_like(xarray.open_zarr(..., chunks=None).chunk()).
chunks would be a dict of chunks on the underlying dataset.
Usage examples:
with beam.Pipeline() as p:
p | xbeam.ZarrToChunks(..., desired_chunks) | ...
with beam.Pipeline() as p:
load_data, template, original_chunks = xbeam.open_zarr(...)
p | load_data | beam.MapTuple(...) | xbeam.ChunksToZarr(..., template, original_chunks)
These could facilitate directly opening data from Zarr using idiomatic patterns in Xarray-Beam (e.g., using Xarray's lazy indexing machinery instead of dask).
I'm imaging
open_zarr()
returning a tuple of valuestransform, template, chunks
providing exactly the information needed to use the dataset in a Zarr-to-Zarr pipeline:transform
would be the beam PTransform that could be used in a pipeline (equivalent to the result ofxbeam.ZarrToChunks()
).template
itself would be an efficient lazy xarray.Dataset consisting of a single dask chunk, e.g., equivalent toxarray.zeros_like(xarray.open_zarr(..., chunks=None).chunk())
.chunks
would be a dict of chunks on the underlying dataset.Usage examples: