pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.63k stars 1.09k forks source link

Unexpected dim order behavior #9718

Open openSourcerer9000 opened 2 weeks ago

openSourcerer9000 commented 2 weeks ago

Edit: see below for updated request.

Is your feature request related to a problem?

the rest of the scipy ecosystem requires numpy arrays for everything. dim order is the only organization you have in np. it causes tons of issues moving back and forth between xarray and np simply because dimensions are displayed in alphabetical order rather than their actual order. image

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

openSourcerer9000 commented 2 weeks ago

NVMD it shows it correctly after the data var

kmuehlbauer commented 2 weeks ago

@openSourcerer9000 Thanks for raising this.

The big question is: What is the actual order?

You can imagine any DataArrays in a Dataset (arrays and their dimensions):

A(time, range, sample) B(time, sample) C(time, size, range) D(size, range)

So how to order those?

openSourcerer9000 commented 1 week ago

It's still messing me up that dataset does not have dim order. ds.transpose doesn't seem to actually do anything. I think the improvement would be to have ds.transpose actually change the order of dims for each data var to match the order imposed. image image

dcherian commented 1 week ago

It does actually do that. to_dataframe lets you specify an order using the dim_order kwarg: https://docs.xarray.dev/en/stable/generated/xarray.DataArray.to_dataframe.html

openSourcerer9000 commented 1 week ago

Isn't the dim order already specified? The behavior seems strange to me. It may be ambiguous with many data vars, but with a single data var it should be pretty clear. I think we often use a single var dataset over dataarray to merge variables or to avoid seeing "xarray_dataarray_variable" names pop up after serializing.

dcherian commented 1 week ago

Datasets does not, and will not, enforce consistency of dimension ordering among dataarrays.

So where it does matter, like in to_dataframe, we are forcing you to be explicit and write out what dimension order you want for that function. We can't just pick the dim order of the first variable because not all variables have the same dimensions.

keewis commented 1 week ago

We can't just pick the dim order of the first variable because not all variables have the same dimensions.

... and thus Dataset.to_dataframe uses ds.sizes as the default dimension order, which is not affected by Dataset.transpose.

I think we often use a single var dataset over dataarray to merge variables or to avoid seeing "xarray_dataarray_variable" names pop up after serializing.

You can assign a name to a unnamed DataArray, which will be used by to_dataframe:

arr.rename("variable")