pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.63k stars 1.09k forks source link

fixing behaviour for group parameter in `open_datatree` #9666

Closed aladinor closed 1 month ago

aladinor commented 1 month ago

Hi all.

This might be more complex than pruning the path in the open_group_as_dict function. It is kind of complex because when we use _iter_zarr_groups, it yields the root group. I am still working o it

aladinor commented 1 month ago

@TomNicholas, I think this solution will work, but the DataTree.from_dict(groups_dict) function will create the root as an empty node. I think we might need to check it out. Let me know your thoughts.

aladinor commented 1 month ago

I think we are getting close. However, we are still having some discrepancies when comparing both datatrees using the group parameter and when directly selecting it via path. Example:

print(dtree["/group2/subg1"])
Group: /group2/subg1 <----- different root paths here
│   Dimensions:  (x: 2, y: 3)
│   Inherited coordinates:
│     * x        (x) int64 16B -1 -2
│     * y        (y) int64 24B 0 1 2
│   Data variables:
│       blah     (x) int64 16B 2 3
├── Group: /group2/subg1/subsub1 <----- different paths here
│       Dimensions:  (y: 3)
│       Data variables:
│           var      (y) int64 24B 4 5 6
└── Group: /group2/subg1/subsub2 <----- different  paths here

dt2 = xr.open_datatree("test.zarr",
                          group="/group2/subg1"

print(dt2)
Group: /  <----- different root paths here
│   Dimensions:  (x: 2)
│   Dimensions without coordinates: x
│   Data variables:
│       blah     (x) int64 16B ...
├── Group: /subsub1 <----- different  paths here
│       Dimensions:  (y: 3)
│       Dimensions without coordinates: y
│       Data variables:
│           var      (y) int64 24B ...
└── Group: /subsub2 <----- different  paths here

Any comments on this @TomNicholas @keewis?

keewis commented 1 month ago

this is by design, I think? I'd interpret group="/group2/subgroup1" as saying "give me that group as the new root group". Then tree.encoding["source_group"] can contain the full path of the new root.

(if you know unix commands, this would be similar to chroot)

TomNicholas commented 1 month ago

discrepancies when comparing both datatrees using the group parameter and when directly selecting it via path

The reprs are different because the objects are different: dtree["/group2/subg1"] has a parent, whereas dt2 does not. So this is intended.

TomNicholas commented 1 month ago

The test should look very much like the ones in #9669 - create a tiny nested tree, save it to zarr/netcdf, open it with the group parameter, and check the structure is as expected.

TomNicholas commented 1 month ago

Sorry @aladinor - in fact could we just do this? https://github.com/pydata/xarray/pull/9666#discussion_r1815590340

TomNicholas commented 1 month ago

This looks good! @keewis 's comments are addressed so I'm going to merge it.

keewis commented 1 month ago

(you can still add yourself to the list of contributors to the DataTree entry in the whats-new, @aladinor Oh, and me too, apparently :sweat_smile:)

Edit: see https://github.com/pydata/xarray/blob/5b2e6f1895c8c31dd2b3cecf1117f2b166e14b9a/doc/whats-new.rst?plain=1#L24-L32

keewis commented 1 month ago

Otherwise @TomNicholas can do that while preparing for the release.

TomNicholas commented 1 month ago

Amazing thank you! And thanks for pointing that out so everyone involved can get credit for these great contributions!

aladinor commented 1 month ago

Thanks @TomNicholas and @keewis for your guidance!