Open Illviljan opened 1 year ago
I'd say it is rather a feature. Despite being very similar to the original one, the new coord is not a dimension coordinate, which is thus not indexed by default so loc
or other label-based indexing won't work with it.
That said, we could probably make the error message a bit more user-friendly and verbose than just saying that no index is found, e.g.,
"No index found for coordinate 'x', which therefore cannot be used with .loc
or .sel
(label-based indexing). Set an index first for that coordinate using either set_index
or set_xindex
. Note that you can still use .iloc
and .isel
(integer-based indexing) when no index is set."
Would you mind showing the intended way for my example? Because I don't find this intuitive at all.
ds["air_new"] = ds.air.copy().assign_coords(
{"time_float": ds.time.astype(float)}
)
# ds["air_new"] = ds["air_new"].set_xindex("time_float") # Doesn't work
a = ds["air_new"].set_xindex("time_float")
air_new_changed = a[{"time": 0}] * 4
a.loc[air_new_changed.coords] = air_new_changed
ValueError: Xarray does not support label-based selection with more than one index over the following dimension(s):
'time': 2 indexes involved
Suggestion: use a multi-index for each of those dimension(s).
Other tests:
ds.air_new.iloc[air_new_changed.coords] = air_new_changed # Error! :(
AttributeError: 'DataArray' object has no attribute 'iloc'
ds.air_new.sel[air_new_changed.coords] = air_new_changed # Error! :(
TypeError: 'method' object does not support item assignment
Haha yes agreed that is confusing. So improving the error messages will require more coordination across the codebase and thinking more about the possible cases. That is not an easy task.
Regarding your example, just setting a single index for "time_float" does not work: currently you cannot call loc
or sel
and pass labels at once for two or more independently indexed coordinates that share the same dimension (the error message mostly makes sense except the "dimension" term that is not accurate here). This because it is easier to raise now when such case occurs and maybe figure out later how we can merge results from different index queries along the same dimension.
(My bad, Xarray has no iloc
indeed... I'm mostly using isel/sel).
I'm not sure what exactly you're trying to achieve in your example, actually.
item assignment on label selection using both the "time" and "time_float" coordinates? Then set a multi-index from both of them and call loc
. Or maybe chain two calls to loc
each with one of these coordinates but I'm not sure at all that it will work.
item assignment on label selection using only the "time" coordinate? Then be sure to not pass any mapping to loc
that contains a key referring to an unindexed coordinate (i.e., do not include "time_float" there).
I simply want to modify a few values in an array. :) A similar example in numpy-land:
a = np.array([1, 1, 1, 1])
b = a[0] * 3
a[0] = b
Now in xarray land this ds["air_new"]
happens to have a few helper coordinates. This is what I came up with and it took longer than I'd like to admit:
# Remove unneccessary coordinates without indexes:
ds["air_new"] = ds.air.copy().assign_coords({"time_float": ds.time.astype(float)})
air_new_changed = ds.air_new[{"time": 0}] * 3
ds.air_new.loc[
{
k: air_new_changed.coords[k].values
for k in air_new_changed.coords.keys() & air_new_changed.indexes.keys()
}
] = air_new_changed
Is this the most elegant way? How would you have done it?
Using masks doesn't have this issue for some reason:
# Use a mask instead...
ds["air_new"] = ds.air.copy().assign_coords({"time_float": ds.time.astype(float)})
mask = xr.ones_like(ds["air_new"], dtype=bool)
mask.loc[{"time": ds.time.isel(time=0)}] = False
ds["air_new"] = ds["air_new"].where(mask, 3 * ds["air_new"])
Why would I want to pass around helper coordinates?
ds["relative_time"] = ds.time - ds.time[0]
ds["time_float"] = ds.time.astype(float)
Yes helper coordinates are useful. However I don't really understand why do you want to pass labels to .loc
for all of those coordinates (or all the indexed ones) at once?
Since your helper coordinates are all derived from the same original coordinate, couldn’t you just select data using the one that is the most convenient to you ? E.g.,
ds.air_new.loc[{"time": air_new_changed.time.values}] = air_new_changed
This should yield the same results (and it is much more elegant) than
ds.air_new.loc[
{
k: air_new_changed.coords[k].values
for k in air_new_changed.xindexes.keys()
}
] = air_new_changed
Both the "time" and "time_float" coordinates share the same dimension and should be subset / propagated the same way.
I doubt that you want to update values on a selection using arbitrary (possibly non-overlapping) labels for each of those coordinates, do you? E.g.,
ds.air_new.loc[{"time"="2013-01-01", "time_float"=1.42e18}] = …
We don’t support that (yet) because it is more complicated. How should we join the integer array indices selected on the "time" dimension from the given "time" and "time_float" labels? By union, intersection or difference? Or do we want an exact match and raise otherwise? It might get even more complicated when multiple indexes, coordinates and/or dimensions are involved. In theory, we could support it but it would represent quite a lot of work and the implementation would quickly get much more complicated.
I can imagine how just passing all coordinates like
ds.air_new.loc[air_new_changed.coords] = air_new_changed
would be convenient as (with a carefully chosen default behavior) it removes some manual steps or thinking, similarly to Xarray auto-alignment. However, after we start considering more use cases we will quickly see that there is no easy generic solution, like for alignment (#7045). Note that we also don't support alignment when multiple (independent) indexes are found along the same dimension (example here) for the same reasons.
What happened?
I'm not sure if this is a bug or a feature but I was expecting this example to work since the new coord is just a slight rewrite of the original dimension coordinate: