jhamman commented 1 week ago

What is your issue?

Zarr-Python 3.0 is getting close to a full release. This issue tracks the integration of the 3.0 release with Xarray.

Here's a running list of issues we're solving upstream related to integration with Xarray:

Special shout out to @TomAugspurger has been front running a lot of this 🙌.

TomAugspurger commented 1 week ago

High Level Plan

We can think about a couple lines of related work:

Get xarray working with zarr-python 3.x (read / write Zarr v2 data)
Support Zarr v3

I think that supporting zarr-python 3.x is the primary goal for now.

Work Items

These PRs are needed on top of zarr-python v3 to get anything working:

[ ] Consolidated metadata: https://github.com/zarr-developers/zarr-python/pull/2113
[ ] v2 array metadata: https://github.com/zarr-developers/zarr-python/pull/2270

These are some issues we'll need to resolve:

Changes for zarr-python:

[ ] Support for more dtypes: https://github.com/zarr-developers/zarr-python/issues/2153
[ ] Change default fill values?: https://github.com/zarr-developers/zarr-python/issues/2265 (releaded to #5475)
[ ] Fix Group.__getitem__ for arrays at the top-level: https://github.com/zarr-developers/zarr-python/pull/2272

Changes for xarray:

Pass through zarr_format=2/3/None in all public read / write APIs.
[ ] _FillValue and Zarr's fill_value causing valid values to be cast to NaN: https://github.com/pydata/xarray/issues/5475 (this could use feedback from the xarray maintainers).
[ ] Update for filters / compressor -> codecs change: https://github.com/zarr-developers/zarr-python/issues/2194 (most likely will require changes in xarray).
[ ] Update zarr.Blosc imports to numcodecs.Blosc (can be done anytime)

Fixed issues

[x] basic zarr-python 2.x compatibility: https://github.com/zarr-developers/zarr-python/pull/2098
[x] Attributes.asdict: https://github.com/zarr-developers/zarr-python/pull/2221
[x] Fixed codec pipeline for zarr-v2: https://github.com/zarr-developers/zarr-python/pull/2244
[x] (nice to have): https://github.com/zarr-developers/zarr-python/pull/2249
[x] Create intermediate Groups when creating a nested node: https://github.com/zarr-developers/zarr-python/pull/2262

Things to investigate:

separate store / chunk_store
writing a subset of regions

dcherian commented 3 days ago

@TomAugspurger are you able to open a WIP PR with in-progress work. It'd be nice to see what's needed

TomAugspurger commented 3 days ago

Sure, https://github.com/pydata/xarray/pull/9552 has that.

TomAugspurger commented 17 hours ago

Question for the group: does anyone object to xarray continuing to write Zarr V2 datasets by default? I hesitate to have xarray's default be different from zarr-python's, but that would relive some pressure to address https://github.com/pydata/xarray/issues/5475 quickly, since v2 datasets should be round-tripable.

pydata / xarray

Zarr Python 3 tracking issue #9515