pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.62k stars 1.08k forks source link

Unstack non-index coordinate #4397

Open kripnerl opened 4 years ago

kripnerl commented 4 years ago

What happened:

Non-indexed coordinates are not unstacked correctly. The example below should be self-explanatory.

In the example below, I create a surface of the torus which should be used for some calculation for each point - therefore I have to use multi-index. The surface is given by the set of (R, Z) -- indexed by pol_idx -- which is the same for each toroidal angle phi. At the end of the day I would like to have the mesh of ('phi', 'pol_idx').

What you expected to happen:

ds.stack(multiindex=("a", "b")).unstack("multindex")should returnds`.

Minimal Complete Verifiable Example:

theta = np.linspace(0, np.pi)
R0 = 1

grid = xr.Dataset(coords={"R": ("pol_idx", R0 * np.cos(theta)), 
                          "Z": ("pol_idx", R0 * np.sin(theta)), 
                          "phi": np.linspace(0, 2*np.pi, N_phi_points, endpoint=False)})
> <xarray.Dataset>
> Dimensions:  (phi: 256, pol_idx: 50)
> Coordinates:
>    R        (pol_idx) float64 1.0 0.9979 0.9918 0.9816 ... -0.9918 -0.9979 -1.0
>    Z        (pol_idx) float64 0.0 0.06407 0.1279 ... 0.1279 0.06407 1.225e-16
>  * phi      (phi) float64 0.0 0.02454 0.04909 0.07363 ... 6.21 6.234 6.259
> Dimensions without coordinates: pol_idx
> Data variables:
>   *empty*

grid = grid.stack(index=("pol_idx", "phi"))

> <xarray.Dataset>
> Dimensions:  (index: 12800)
> Coordinates:
>     R        (index) float64 1.0 1.0 1.0 1.0 1.0 ... -1.0 -1.0 -1.0 -1.0 -1.0
>     Z        (index) float64 0.0 0.0 0.0 0.0 ... 1.225e-16 1.225e-16 1.225e-16
>   * index    (index) MultiIndex
>   - pol_idx  (index) int64 0 0 0 0 0 0 0 0 0 0 ... 49 49 49 49 49 49 49 49 49 49
>   - phi      (index) float64 0.0 0.02454 0.04909 0.07363 ... 6.21 6.234 6.259
> Data variables:
>     *empty*

# Here I would like to perform some calculation on the prepared grid over the multiindex.

grid = grid.unstack("index")

> <xarray.Dataset>
> Dimensions:  (phi: 256, pol_idx: 50)
> Coordinates:
>   R        (pol_idx, phi) float64 1.0 1.0 1.0 1.0 1.0 ... -1.0 -1.0 -1.0 -1.0
>   Z        (pol_idx, phi) float64 0.0 0.0 0.0 ... 1.225e-16 1.225e-16
>  * pol_idx  (pol_idx) int64 0 1 2 3 4 5 6 7 8 9 ... 41 42 43 44 45 46 47 48 49
>   * phi      (phi) float64 0.0 0.02454 0.04909 0.07363 ... 6.21 6.234 6.259
> Data variables:
>     *empty*

Comment:

I am not sure whether this is a bug. However, I find this behaviour counterintuitive.

Environment:

Output of xr.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.7.4 (default, Aug 13 2019, 20:35:49) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 4.15.0-112-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.1 xarray: 0.14.1 pandas: 0.25.1 numpy: 1.17.2 scipy: 1.3.1 netCDF4: 1.4.2 pydap: None h5netcdf: 0.7.4 h5py: 2.9.0 Nio: None zarr: None cftime: 1.0.4.2 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.2.1 dask: 2.5.2 distributed: 2.5.2 matplotlib: 3.1.3 cartopy: None seaborn: 0.9.0 numbagg: None setuptools: 41.4.0 pip: 19.2.3 conda: 4.8.3 pytest: 5.2.1 IPython: 7.8.0 sphinx: 2.2.0
stale[bot] commented 2 years ago

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

dcherian commented 2 years ago

It seems like one issue here is that stack + unstack doesn't roundtrip a non-dim coord variable we start with

> Coordinates:
>    R        (pol_idx) float64 1.0 0.9979 0.9918 0.9816 ... -0.9918 -0.9979 -1.0
>    Z        (pol_idx) float64 0.0 0.06407 0.1279 ... 0.1279 0.06407 1.225e-16

and end with

> Coordinates:
>   R        (pol_idx, phi) float64 1.0 1.0 1.0 1.0 1.0 ... -1.0 -1.0 -1.0 -1.0
>   Z        (pol_idx, phi) float64 0.0 0.0 0.0 ... 1.225e-16 1.225e-16

But I don't think that's unexpected given the report.

@kripnerl Sorry for the really late response. Can you clarify which behaviour is surprising to you here?