pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.57k stars 1.07k forks source link

Dataset.dims should return a set, not a dict of sizes #8496

Open TomNicholas opened 10 months ago

TomNicholas commented 10 months ago

What is your issue?

This is inconsistent:

In [25]: ds
Out[25]: 
<xarray.Dataset>
Dimensions:  (x: 1, y: 2)
Dimensions without coordinates: x, y
Data variables:
    a        (x, y) int64 0 1

In [26]: ds['a'].dims
Out[26]: ('x', 'y')

In [27]: ds['a'].sizes
Out[27]: Frozen({'x': 1, 'y': 2})

In [28]: ds.dims
Out[28]: Frozen({'x': 1, 'y': 2})

In [29]: ds.sizes
Out[29]: Frozen({'x': 1, 'y': 2})

Surely ds.dims should return something like a Frozenset({'x', 'y'})? (because dimension order is meaningless when you have multiple arrays underneath - see https://github.com/pydata/xarray/issues/8498)

max-sixty commented 10 months ago

I agree — and given there is a clear alternative to ds.dims — ds.sizes, it would be reasonable to put a FutureWarning on it and then change it in a few versions.

(I thought there might be an issue for this but couldn't find one...)

TomNicholas commented 10 months ago

This is actually more than 5 minutes of effort: once I add the FutureWarning, then a huge number of warnings will be raised by internals. Some of those warnings then break tests that assert_no_warnings.

To remove these warnings I have to find every location where .dims was called but a dict was intended (but not places where .dims was called and only the keys used). I can't simply grep for that...

TomNicholas commented 10 months ago

Also relevant previous issues (very old): #921 and #1076

TomNicholas commented 10 months ago

Also if I warn on ds.dims, then even for future-proof uses of ds.dims (e.g for d in ds.dims) the only way for the user to remove the warning is to use ds.sizes.keys(), but that's not actually what we want them to do eventually!

On Thu, Nov 30, 2023, 5:47 PM Maximilian Roos @.***> wrote:

I agree — and given there is a clear alternative to ds.dims, it would be reasonable to put a FutureWarning on it and then change it in a few versions.

(I thought there might be an issue for this but couldn't find one...)

— Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/8496#issuecomment-1834680374, or unsubscribe https://github.com/notifications/unsubscribe-auth/AISNPI5TSOV72JSNVHOFYQLYHEEJFAVCNFSM6AAAAABAB43R26VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMZUGY4DAMZXGQ . You are receiving this because you authored the thread.Message ID: @.***>

dcherian commented 10 months ago

Yeah that struck me too. It seems like a custom Mapping object would be the way to do. Raise warnings on __getitem__ but not on __iter__.

TomNicholas commented 10 months ago

Oh that's clever! I did not think of that.

dcherian commented 10 months ago

then a huge number of warnings will be raised by internals.

One insight I've learned form Stephan on this repo is to treat warnings in the test suite as a metric for the impact on users. This one seems pretty big! =)

TomNicholas commented 10 months ago

This one seems pretty big! =)

I hope that's a good thing 😅

@dcherian I think that ds.groupby.dims should also be changed for consistency too FYI.

TomNicholas commented 2 weeks ago

Are we ready to complete this deprecation cycle? I ask because it would be nice to release DataTree with DataTree.dims able to be start with the new intended behaviour...