pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.62k stars 1.08k forks source link

drop_sel returns KeyError for abbreviated dates #7699

Open scottyhq opened 1 year ago

scottyhq commented 1 year ago

What happened?

for a dataframe with a detailed timestamp coordinate (2022-03-11T13:50:50.551314000)

da.sel(time="2022-03-11") works but da.drop_sel(time="2022-03-11") results in a KeyError

What did you expect to happen?

i expect drop_sel() to work with the same inputs as sel()

Minimal Complete Verifiable Example

import numpy as np
import xarray as xr
times = np.array(['2022-03-03T13:50:49.548072000', '2022-03-11T13:50:50.551314000',
                  '2022-03-12T13:44:35.547024000', '2022-03-13T12:18:48.015182000'],
                  dtype='datetime64[ns]')

da = xr.DataArray(np.arange(4), dims="time", coords={"time": times})
da.drop_sel(time='2022-03-11')

A workaround is to use the exact datestring:

da.drop_sel(time='2022-03-11T13:50:50.551314000')

MVCE confirmation

Relevant log output

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[4], line 1
----> 1 da.drop_sel(time='2022-03-11')

File ~/.local/envs/xvec/lib/python3.11/site-packages/xarray/core/dataarray.py:3105, in DataArray.drop_sel(self, labels, errors, **labels_kwargs)
   3102 if labels_kwargs or isinstance(labels, dict):
   3103     labels = either_dict_or_kwargs(labels, labels_kwargs, "drop")
-> 3105 ds = self._to_temp_dataset().drop_sel(labels, errors=errors)
   3106 return self._from_temp_dataset(ds)

File ~/.local/envs/xvec/lib/python3.11/site-packages/xarray/core/dataset.py:5292, in Dataset.drop_sel(self, labels, errors, **labels_kwargs)
   5290     except KeyError:
   5291         raise ValueError(f"dimension {dim!r} does not have coordinate labels")
-> 5292     new_index = index.drop(labels_for_dim, errors=errors)
   5293     ds = ds.loc[{dim: new_index}]
   5294 return ds

File ~/.local/envs/xvec/lib/python3.11/site-packages/pandas/core/indexes/base.py:6934, in Index.drop(self, labels, errors)
   6932 if mask.any():
   6933     if errors != "ignore":
-> 6934         raise KeyError(f"{list(labels[mask])} not found in axis")
   6935     indexer = indexer[~mask]
   6936 return self.delete(indexer)

KeyError: "['2022-03-11'] not found in axis"

Anything else we need to know?

related to https://github.com/pydata/xarray/issues/6605#issuecomment-1126139617

Environment

(xarray: 2023.3.0)

INSTALLED VERSIONS ------------------ commit: None python: 3.11.0 | packaged by conda-forge | (main, Jan 14 2023, 12:27:40) [GCC 11.3.0] python-bits: 64 OS: Linux OS-release: 5.4.231-137.341.amzn2.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.9.1 xarray: 2023.3.0 pandas: 1.5.3 numpy: 1.24.2 scipy: 1.10.1 netCDF4: 1.6.3 pydap: None h5netcdf: 1.1.0 h5py: 3.8.0 Nio: None zarr: None cftime: 1.6.2 nc_time_axis: None PseudoNetCDF: None rasterio: 1.3.6 cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.7.1 cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 67.6.0 pip: 23.0.1 conda: None pytest: None mypy: None IPython: 8.11.0 sphinx: None
dcherian commented 1 year ago

Thanks we definitely want everything that works for sel to work for drop_sel.

https://github.com/pydata/xarray/blob/44488288fd8309e3468ee45a5f7408d75a21f493/xarray/core/dataset.py#L5292

index.get_indexer(labels_for_dim) will get you the correct indices to drop. Though probably there's another method that could work.

cc @benbovy