Support .reindex with DataArrays and Dataset as indexers

weipeng1999 commented 1 year ago

I want to select data using Dataset as indexers but fill missing with NaN, but I found that the .sel method should match all the indices. After searching the doc, I get that the method .reindex can allow some indices that not match and fill them with NaN. But unfortunately the .reindex does not support DataArrays and Dataset as indexers. I think the best way to deal this situation is adding this support to .reindex method, so we can keep maximum compatibility and make less implication changed, shall we?

dcherian commented 1 year ago

Thanks @weipeng1999 . Can you please provide a minimal example showing the syntax and expected output?

weipeng1999 commented 1 year ago

Thanks @weipeng1999 . Can you please provide a minimal example showing the syntax and expected output?

Can I just copy from the doc and use comment to mention the change

In [100]: da = xr.DataArray(
    ....:     np.random.rand(4, 2),
    ....:     [
    ....:         ("time", pd.date_range("2000-01-01", periods=4)),
    ....:         ("space", ["IA", "IL"]), # do not have the "IN" label
    ....:     ],
    ....: )

In [101]: times = xr.DataArray(
    ....:     pd.to_datetime(["2000-01-03", "2000-01-02", "2000-01-01"]), dims="new_time"
    ....: )

In [102]: # use .reindex instead of .sel
    ....: # and give the parameter : "fill_value"
    ....: da.reindex(space=xr.DataArray(["IA", "IL", "IN"], dims=["new_time"]), time=times, fill_value=np.nan)
Out[102]: 
<xarray.DataArray (new_time: 3)>
array([0.92, 0.34, NaN]) # so fill the missing value by np.nan
Coordinates:
    time      (new_time) datetime64[ns] 2000-01-03 2000-01-02 2000-01-01
    space     (new_time) <U2 'IA' 'IL' 'IN'
  * new_time  (new_time) datetime64[ns] 2000-01-03 2000-01-02 2000-01-01

weipeng1999 commented 1 year ago

So we can guarantee that:

.reindex: set the missing value to fill_value (derfault is nan), and the result data may have nan.
.fillna: deal the nan in data and the result data will not have nan.
.sel: do not change the state.

pydata / xarray

Support .reindex with DataArrays and Dataset as indexers #7193