pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.62k stars 1.08k forks source link

DataArray.transpose with transpose_coords=True does not change coords order #7294

Open templiert opened 1 year ago

templiert commented 1 year ago

What happened?

I used DataArray.transpose with transpose_coords=True to change the coords order from startings_dims = "dim_0", "dim_1", "dim_2" to reordered_dims = "dim_2", "dim_1", "dim_0".

The order of dims was correctly transposed but the order of coords remained unchanged.

What did you expect to happen?

I expected the transposed coords to be in the new order:

reordered_dims = "dim_2", "dim_1", "dim_0"

Minimal Complete Verifiable Example

import numpy as np
import pandas as pd
import xarray as xr

np.random.seed(0)
temperature = np.random.randn(4, 4, 3)
dim_0_values = [1, 2, 3, 4]
dim_1_values = [5, 6, 7, 8]
dim_2_values = pd.date_range("2014-09-06", periods=3)
starting_dims = "dim_0", "dim_1", "dim_2"

da = xr.DataArray(
    data=temperature,
    dims=starting_dims,
    coords=dict(
        dim_0=dim_0_values,
        dim_1=dim_1_values,
        dim_2=dim_2_values,
    ),
    attrs=dict(
        description="Ambient temperature.",
        units="degC",
    ),
)

print(f"{da.dims=}")
print(f"{da.coords.keys()=}")

reordered_dims = "dim_2", "dim_1", "dim_0"
print(f"{da.transpose(*reordered_dims).dims=}")
print(f"{da.transpose(*reordered_dims).coords.keys()=}")
print(f"{da.transpose(*reordered_dims, transpose_coords=True).coords.keys()=}")

MVCE confirmation

Relevant log output

da.dims=('dim_0', 'dim_1', 'dim_2')
da.coords.keys()=KeysView(Coordinates:
  * dim_0    (dim_0) int32 1 2 3 4
  * dim_1    (dim_1) int32 5 6 7 8
  * dim_2    (dim_2) datetime64[ns] 2014-09-06 2014-09-07 2014-09-08)
da.transpose(*reordered_dims).dims=('dim_2', 'dim_1', 'dim_0')
da.transpose(*reordered_dims).coords.keys()=KeysView(Coordinates:
  * dim_0    (dim_0) int32 1 2 3 4
  * dim_1    (dim_1) int32 5 6 7 8
  * dim_2    (dim_2) datetime64[ns] 2014-09-06 2014-09-07 2014-09-08)
da.transpose(*reordered_dims, transpose_coords=True).coords.keys()=KeysView(Coordinates:
  * dim_0    (dim_0) int32 1 2 3 4
  * dim_1    (dim_1) int32 5 6 7 8
  * dim_2    (dim_2) datetime64[ns] 2014-09-06 2014-09-07 2014-09-08)

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.9.12 (main, Apr 4 2022, 05:22:27) [MSC v.1916 64 bit (AMD64)] python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 85 Stepping 7, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: ('English_United States', '1252') libhdf5: 1.10.6 libnetcdf: None xarray: 2022.6.0 pandas: 1.4.2 numpy: 1.21.5 scipy: 1.9.3 netCDF4: None pydap: None h5netcdf: None h5py: 3.6.0 Nio: None zarr: 2.13.2 cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.4 dask: 2022.02.1 distributed: 2022.2.1 matplotlib: 3.5.1 cartopy: None seaborn: 0.11.2 numbagg: None fsspec: 2022.02.0 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 61.2.0 pip: 22.3.1 conda: 4.12.0 pytest: 7.1.1 IPython: 8.2.0 sphinx: 4.4.0
headtr1ck commented 1 year ago

The coordinates are just a mapping from names to DataArrays, I don't think the order has any meaning (like it does for normal dicts).

templiert commented 1 year ago

Thanks. On one hand I see your point: a mapping does not ensure order. On the other hand, is it not counter-intuitive that the order of the dictionary-like container coords is not changed when using transpose_coords=True?

Here my reasoning:

headtr1ck commented 1 year ago

And what should happen with non-dimension coordinates? Should they simply end up in the end?

headtr1ck commented 1 year ago

The logic for this is not really trivial unless I am missing some obvious trick. You basically have to change the loop here: https://github.com/pydata/xarray/blob/ff6793d975ef4d1d8d5d32b8ad6f4f44e02dda9b/xarray/core/dataarray.py#L2919 To loop first over the new dims and potential coordinates with the same name and then over the rest of the coordinates (If possible in a single loop without code repetition).

Feel free to propose a PR!

headtr1ck commented 1 year ago

Just noticed that the same logic does not work for Datasets, since all variables are kept in a common dict and the information about which are coordinates and which are data-variables is kept in a set, which is not ordered...

keewis commented 1 year ago

transpose_coords is used to transpose coordinates with multiple dimensions:

In [1]: import xarray as xr
   ...: 
   ...: ds = xr.tutorial.open_dataset("rasm")
   ...: ds.Tair.attrs.clear()
   ...: ds.Tair
Out[1]: 
<xarray.DataArray 'Tair' (time: 36, y: 205, x: 275)>
[2029500 values with dtype=float64]
Coordinates:
  * time     (time) object 1980-09-16 12:00:00 ... 1983-08-17 00:00:00
    xc       (y, x) float64 ...
    yc       (y, x) float64 ...
Dimensions without coordinates: y, x

In [2]: ds.Tair.transpose("x", "y", ..., transpose_coords=False)
Out[2]: 
<xarray.DataArray 'Tair' (x: 275, y: 205, time: 36)>
[2029500 values with dtype=float64]
Coordinates:
  * time     (time) object 1980-09-16 12:00:00 ... 1983-08-17 00:00:00
    xc       (y, x) float64 ...
    yc       (y, x) float64 ...
Dimensions without coordinates: x, y

In [3]: ds.Tair.transpose("x", "y", ..., transpose_coords=True)
Out[3]: 
<xarray.DataArray 'Tair' (x: 275, y: 205, time: 36)>
[2029500 values with dtype=float64]
Coordinates:
  * time     (time) object 1980-09-16 12:00:00 ... 1983-08-17 00:00:00
    xc       (x, y) float64 ...
    yc       (x, y) float64 ...
Dimensions without coordinates: x, y

Interestingly, transpose_coords is only an option for DataArray.transpose, and it defaults to True. This means that the example from https://github.com/pydata/xarray/issues/7294#issue-1452123685 always does the same thing, so even if we did implement the reordering nothing would change.

As such, I'm -0.5 on changing the order in which the coordinates are stored, since the only time that order is used is the repr / HTML repr. In the past we have actually considered sorting the coordinates alphabetically, which did not happen because the coordinate names can be hashables of arbitrary types, and comparing a pair of hashables of different types is not easy.