Open DWesl opened 2 years ago
Hi @DWesl,
Your example is working with the main
branch, it has been fixed in #6889.
# after #6889
arr2.unstack("index0")
# <xarray.DataArray (index2: 2, level_0: 3, level_1: 5)>
# array([[[ 0, 1, 2, 3, 4],
# [ 5, 6, 7, 8, 9],
# [10, 11, 12, 13, 14]],
#
# [[ 0, 1, 2, 3, 4],
# [ 5, 6, 7, 8, 9],
# [10, 11, 12, 13, 14]]])
# Coordinates:
# * level_0 (level_0) int64 0 1 2
# * level_1 (level_1) int64 0 1 2 3 4
# Dimensions without coordinates: index2
Fix confirmed, thank you.
Hi @DWesl, @benbovy
I got a bug very similar to what was initially report with the following minimal example (xarray version 2022.11.0) with the error:
ValueError: cannot unstack dimensions that do not have exactly one multi-index: ('z',)
It is working fine with my previous version (xarray 0.16.0)
import xarray as xr
ds = xr.Dataset(
data_vars=dict(
mydata = ( ['across', 'along'], [[0,1],[2,3]])
)
)
stacked = ds.stack(z=("across", "along"))
newlist = [None] * stacked.z.size
for ii, zindex in enumerate(stacked.z.data):
newlist[ii] = stacked.mydata.sel(z=zindex)
newds = xr.concat(newlist, dim='z')
newds['z'] = stacked.z
print('xarray version:' + xr.__version__)
newds.unstack(dim='z')
If I do newds = newds.reset_index('z')
just before the last line, it is solved for my minimal example. But it still doesn't work for my real problem and I don't think it is the best way to proceed.
Many thanks for your help, Adrien
@ACHMartin the issue is when you do newds['z'] = stacked.z
. In the last versions of Xarray multi-index levels have each their own (real) coordinates, for consistency and clarity we soon won't support assigning a multi-index to a single coordinate of a Dataset / DataArray like that.
I think that in other places we still do support it with a deprecation notice, but apparently in your example this is not the case. unstack
doesn't work because the multi-index(es) and the coordinates of newds
are not consistent.
I don't know exactly what is your real problem, but from now on you should avoid implicitly assign a multi-index with xr_obj["my_coord"] = ...
or xr_obj.assign(my_coord=...)
. Instead you should re-create the multi-index, e.g., in your minimal example newds = newds.set_index(z=["across", "along"])
.
Many thanks for the explanation and the rapid answer. I am not very familiar with multi-index.
My real problem works with a DataSet with different sort of variables:
ipdb> lmmap
<xarray.Dataset>
Dimensions: (x_variables: 4, Observables: 2, Antenna: 4, z: 4,
Ambiguities: 4, fun_variables: 8, extrema: 2)
Coordinates:
* x_variables (x_variables) <U3 'u' 'v' 'c_u' 'c_v'
* Observables (Observables) <U6 'sigma0' 'RVL'
* Antenna (Antenna) <U4 'Fore' 'MidV' 'MidH' 'Aft'
fun_variables (Observables, Antenna) int64 0 1 2 3 4 5 6 7
Dimensions without coordinates: z, Ambiguities, extrema
Data variables: (12/17)
x (z, Ambiguities, x_variables) float64 -4.503 7.8 ... -0.4265
active_mask (z, Ambiguities, x_variables) int64 0 0 0 0 0 0 ... 0 0 0 0 0
grad (z, Ambiguities, x_variables) float64 7.816e-11 ... -1.873...
fun (z, Ambiguities, Observables, Antenna) float64 0.001137 .....
jac (z, Ambiguities, fun_variables, x_variables) float64 0.359...
x0 (z, Ambiguities, x_variables) float64 -7.002 10.03 ... 1.219
... ...
message (z, Ambiguities) <U42 '`xtol` termination condition is sat...
success (z, Ambiguities) bool True True True True ... True True True
method (z) <U3 'trf' 'trf' 'trf' 'trf'
xtol (z) float64 0.001 0.001 0.001 0.001
x_scale (z, x_variables) float64 7.0 7.0 0.5 0.5 ... 7.0 7.0 0.5 0.5
bounds (z, extrema, x_variables) int64 -30 -30 -5 -5 ... 30 30 5 5
I don't know where I should apply set_index(z=["across", "along"])
I tried on the full DataSet or a single DataArray but without success. I get the following error:
ValueError: across, along variable(s) do not exist
Which is obvious as across and along variables disappeared. Thank you, I will dig in this direction why it happens.
However, it works perfectly with the minimal example. Many thanks
the issue is when you do newds['z'] = stacked.z
Are we raising a warning here?
I didn't see any warning for this
Many thanks for the explanation and the rapid answer. I am not very familiar with multi-index.
My real problem works with a DataSet with different sort of variables:
ipdb> lmmap <xarray.Dataset> Dimensions: (x_variables: 4, Observables: 2, Antenna: 4, z: 4, Ambiguities: 4, fun_variables: 8, extrema: 2) Coordinates: * x_variables (x_variables) <U3 'u' 'v' 'c_u' 'c_v' * Observables (Observables) <U6 'sigma0' 'RVL' * Antenna (Antenna) <U4 'Fore' 'MidV' 'MidH' 'Aft' fun_variables (Observables, Antenna) int64 0 1 2 3 4 5 6 7 Dimensions without coordinates: z, Ambiguities, extrema Data variables: (12/17) x (z, Ambiguities, x_variables) float64 -4.503 7.8 ... -0.4265 active_mask (z, Ambiguities, x_variables) int64 0 0 0 0 0 0 ... 0 0 0 0 0 grad (z, Ambiguities, x_variables) float64 7.816e-11 ... -1.873... fun (z, Ambiguities, Observables, Antenna) float64 0.001137 ..... jac (z, Ambiguities, fun_variables, x_variables) float64 0.359... x0 (z, Ambiguities, x_variables) float64 -7.002 10.03 ... 1.219 ... ... message (z, Ambiguities) <U42 '`xtol` termination condition is sat... success (z, Ambiguities) bool True True True True ... True True True method (z) <U3 'trf' 'trf' 'trf' 'trf' xtol (z) float64 0.001 0.001 0.001 0.001 x_scale (z, x_variables) float64 7.0 7.0 0.5 0.5 ... 7.0 7.0 0.5 0.5 bounds (z, extrema, x_variables) int64 -30 -30 -5 -5 ... 30 30 5 5
I don't know where I should apply
set_index(z=["across", "along"])
I tried on the full DataSet or a single DataArray but without success. I get the following error:ValueError: across, along variable(s) do not exist
Which is obvious as across and along variables disappeared. Thank you, I will dig in this direction why it happens.
However, it works perfectly with the minimal example. Many thanks
Thanks @benbovy, my issue is now resolved, once I have enabled the coords to follow.
Hi @benbovy, my code works for 2 dimensions, but it doesn't with 1D. I think the error comes from .set_index(z=['across'])
which delivers a "PandasIndex(Int64Index(" and not the expected "PandasIndex(MultiIndex(" for unstack
.
I created a Pandas-MultiIndex "MultiIndex([((0,), (1,))], names=['z', 'across'])" (see code just below), but it is not exactly the same one as the one created by .stack()
"PandasIndex(MultiIndex([(0,),(1,)], name='z'))" and I didn't find out how to pass the 'index' to xarray.
import pandas as pd
tuples = [(0,),(1,)],
index = pd.MultiIndex.from_tuples(tuples, names=["z","across"])
index
My modified minimal example is as below:
import xarray as xr
ds = xr.Dataset(
data_vars=dict(
mydata = ( ['across', 'along'], [[0,1],[2,3]])
),
coords=dict( # <- changed
across=[0,1], # <- changed
along=[0,1], # <- changed
), # <- changed
)
stacked = ds.isel(along=0).stack(z=["across"]) # <- changed
newlist = [None] * stacked.z.size
for ii, zindex in enumerate(stacked.z.data):
newlist[ii] = stacked.mydata.sel(z=zindex)
newds = xr.concat(newlist, dim='z')
newds = newds.set_index(z=['across']) # <- changed
newds.unstack(dim='z') # Fail with ValueError: cannot unstack dimensions that do not have exactly one multi-index: ('z',)
Is the underlying cause of this a dupe of https://github.com/pydata/xarray/issues/7148?
I don't think it is a duplicate.
I don't know what @ACHMartin's last minimal example is trying to do, but now the best way to assign a pandas multi-index to a Xarray object is like this:
index = pd.MultiIndex.from_tuples(..., names=["across"])
coords = xr.Coordinates.from_pandas_multiindex(index, dim="z")
ds = xr.Dataset(coords=coords)
It looks like ds.set_index(z=["across"])
demotes to a single index in the case when a single item list is passed. I'm not sure what was the old behavior (a while ago... before we switched to calendar versioning) and whether or not it should be considered as a bug, though.
Regarding assignment of a new coordinate to an existing multi-index coordinate (also reported here above), if we still don't raise an error or a warning we should probably do so.
What happened?
I had a collection of
DataArray
s with a stacked dimension (dimension whose corresponding index is aMultiIndex
). I concatenated them into a singleDataArray
, then tried to unstack the stacked dimension, which failed. Performing the operations in the other order works (unstacking eachDataArray
, then concatenating the unstacked arrays).What did you expect to happen?
I expected that concatenating the arrays then unstacking them would produce the same array as unstacking them then concatenating them, but with the possibility of saving the intermediate concatenated-but-still-stacked
DataArray
for later use as a template.Minimal Complete Verifiable Example
MVCE confirmation
Relevant log output
Anything else we need to know?
The eventual problem to which I wish to apply the solution has two stacked dimensions rather than one, but that's likely irrelevant.
Environment