Open yichechang opened 12 months ago
Thanks for opening your first issue here at xarray! Be sure to follow the issue template! If you have an idea for a solution, we would really welcome a Pull Request with proposed changes. See the Contributing Guide for more. It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better. Thank you!
Thanks for the issue. What would you expect the output to be?
It does seem surprising that passing two arguments succeeds while passing each of them alone succeeds...
A partial look — adding .drop_vars('d1')
makes it succeed, because the argument has a d1
dimension.
data.sel(d1='m')==0
<xarray.DataArray (d2: 3)>
array([ True, False, False])
Coordinates:
d1 <U1 'm'
* d2 (d2) <U1 'a' 'b' 'c'
[nav] In [36]: data.assign_coords(
...: {
...: 'mask_d1_m': (data.sel(d1='n')==0).drop_vars('d1')
...: # 'mask_d1_n': data.sel(d1='n')==0
...: }
...: )
...though I'm not sure why it succeeds when there are two arguments.
Hi - Thanks for the reply and testing!
I guess I would expect that all my examples would all fail, or all succeed.
Sorry I didn't include what I would expect... because I didn't really know what to expect since I haven't thought about why it failed. (But it seems like it should fail, as now it became clearer to me that my mask_d1_m
should not have dimension d1
?). From the error message, I would guess that doing .drop_vars('d1')
would fix the problem (which is true as you demonstrated above!), but I couldn't wrap my head around why a DataArray of dim ('d1', 'd2')
cannot be passed as coordinates for DataArray of dim ('d1', 'd2')
?
Edit:
Also just wanted to point out that it's not just having two arguments that prevents it from failing, it has to be that d1
dim has both d1='m'
and d1='n'
.
import xarray as xr
data = xr.DataArray(
data=[
[0, 1, 2],
[0, 1, 2]
],
coords={
'd1': ['m', 'n'],
'd2': ['a', 'b', 'c']
}
)
# so this will FAIL
data.assign_coords(
{
'mask_d1_m': data.sel(d1='m')==0,
# ^^^
'mask_d1_n': data.sel(d1='m')==1,
# ^^^
}
)
# but this will SUCCEED
data.assign_coords(
{
'mask_d1_m': data.sel(d1='m')==0,
# ^^^
'mask_d1_n': data.sel(d1='n')==1,
# ^^^
}
)
Just wanted to give an update: For my own application (computing masks for a DataArray using its own data, and then attach resulting masks to the DataArray as its new coords), I should make sure my masks contain only labels for its dimensions, in the first place. So, something like
# compute mask based on my data
# mask = data.sel(d1=...) > whatever
# remove labels for extra dimensions
mask = mask.drop_vars([v for v in mask.coords if v not in mask.dims]
# assign mask as coords for the original DataArray
data = data.assign_coords({'my_mask': mask})
But the inconsistent behaviors still seem like a bug, or there's some magic happening during the process of combining multiple coords assignments, that are not explicitly documented. It is not too much of an issue, as
Therefore, I'll leave this issue open but please feel free to close it if this isn't something to be fixed. From my point of view, the inconsistency is surprising, but not a major issue. Thank you for taking the time testing!
as far as I can tell, the cause for the surprising/inconsistent behavior is that xr.core.coordinates.create_coords_with_default_indexes
(or xr.core.merge.merge_coordinates_without_align
, I didn't step through create_coords_with_default_indexes
) drops scalar coordinates with conflicting values. cc @benbovy
In your case, I think the easiest way to work around this is to use .sel(..., drop=True)
which will not keep the scalar coordinate.
Hi @keewis - Thank you for the explanation!
Yeah, I was digging into the codebase a little bit, but unfortunately -- as it is probably evident that I'm not that familiar with xarray's internals -- I was a bit lost. Not that I completely understand now, but I am grateful for an explanation so I know I'm not crazy 🤣
Also thank you for the recommended route with the drop
kwarg in DataArray.sel
method. This is more succinct and can convey my intention more explicitly. Good to learn how to do things correctly... I think I need to sit down and re-learn xarray. I've been using it for a long time, but usually I can get it to do what I want while knowing there must be something I'm still missing.
Really appreciate all of your help : )
Since we are relaxing the constraints that are related to dimension coordinates (e.g., #7989), I'm wondering if we couldn't also relax the case where a scalar coordinate has the same name as a dimension.
I don't think that this would help much here, though. Using .assign_coords
with a dictionary of DataArray objects extracts all their coordinates and tries to merge them with the current Dataset or DataArray. In many cases this is good but sometimes this could have unwanted side effects. Using drop
may help, or alternatively we can ignore all the coordinates (and keep the dimension names) by extracting the variable from the DataArray:
data.assign_coords({'mask_d1_m': (data.sel(d1='m')==0).variable})
What happened?
I'm trying to compute masks (from DataArray's data itself) and assign them as coordinates, but it appears that depending on the combination of coords/dims of the computed masks, sometimes
.assign_coords
will fail.It seems like
It's a bit hard to describe as I don't know the xarray internal itself, but my self-contained minimal example below should demonstrate the issue much clearer.
What did you expect to happen?
No response
Minimal Complete Verifiable Example
MVCE confirmation
Relevant log output
Anything else we need to know?
No response
Environment