pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.63k stars 1.09k forks source link

Explicit indexes: next steps #6293

Open benbovy opened 2 years ago

benbovy commented 2 years ago

5692 is not merged yet now merged but and we can already start thinking about the next steps. I’m opening this issue to list and track the remaining tasks. @pydata/xarray, do not hesitate to add a comment below if you think about something that is missing here.

Continue the refactoring of the internals

Although in #5692 everything seems to work with the current pandas index wrappers for dimension coordinates, not all of Xarray's internals have been refactored yet to fully support (or at least be compatible with) custom indexes. Here is a list of Dataset / DataArray methods that still need to be checked / updated (this list may be incomplete):

I ended up following a common pattern in #5692 when adding explicit / flexible index support for various features (it is quite generic, though, the actual procedure may vary from one case to another and many steps may be skipped):

Relax all constraints related to “dimension (index) coordinates” in Xarray

Indexes repr

Public API for assigning and (re)setting indexes

There is no public API yet for creating and/or assigning existing indexes to Dataset and DataArray objects.

We still need to figure out how best we can (1) assign existing indexes (possibly with their coordinates) and (2) pass index build options.

Other public API for index-based operations

To fully leverage the power and flexibility of custom indexes, we might want to update some parts of Xarray’s public API in order to allow passing arbitrary options per index. For example:

Also:

Documentation

Index types and helper classes built in Xarray

3rd party indexes

benbovy commented 2 years ago

Following thoughts and discussions in various issues (e.g., #6836), I'd like to suggest another section to the ones in the top comment:

Deprecate pandas.MultiIndex special cases in Xarray

They are source of many problems and complexities in Xarray internals (many regressions reported since the index refactor were related to those special cases) and I'm not sure that the value they add is really worth the trouble. Also, in the long term the special treatment of PandasMultiIndex vs. other Xarray multi-indexes may add some confusion.

Some of those features are widely used (e.g., the creation of Dataset / DataArray from pandas multi-indexes is used in many places in unit tests), so we would need convenient alternatives and a smooth transition.

shoyer commented 2 years ago

Yes yes -- the sooner we can get rid of MultiIndex special cases the better!

ChrisBarker-NOAA commented 1 year ago

Any progress on this? I 'd love to see #2233 get resolved.