pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.61k stars 1.08k forks source link

Selecting dates with .sel() doesn't work when time index is in cftime #7504

Open huaracheguarache opened 1 year ago

huaracheguarache commented 1 year ago

What happened?

When I try to select a subset of the data in a dataset/array with a list containing dates it fails when the time index is in cftime, and I get the following error message:

KeyError: "not all values found in index 'time'"

What did you expect to happen?

I expect selecting a set of dates with a list to work the same way as when the time index is in datetime64.

Minimal Complete Verifiable Example

import xarray as xr
import numpy as np

ds = xr.open_dataset("https://thredds.met.no/thredds/dodsC/osisaf/met.no/ice/index/v2p1/nh/osisaf_nh_sie_daily.nc")

# Time coordinates are in datetime64, and selecting dates with a list works.
print(ds.time)
print(ds.sel(time=["2023-01-01", "2023-01-02"]))

# Converting the calendar to all_leap changes the time coordinates to use cftime instead of datetime64.
ds = ds.convert_calendar("all_leap", missing=np.nan).interpolate_na()

# Time coordinates are in cftime, and selecting dates with a list fails.
print(ds.time)
print(ds.sel(time=["2023-01-01", "2023-01-02"]))

MVCE confirmation

Relevant log output

(geoscience) [michael@localhost ~]$ python minimal.py 
<xarray.DataArray 'time' (time: 16107)>
array(['1979-01-01T00:00:00.000000000', '1979-01-02T00:00:00.000000000',
       '1979-01-03T00:00:00.000000000', ..., '2023-02-03T00:00:00.000000000',
       '2023-02-04T00:00:00.000000000', '2023-02-05T00:00:00.000000000'],
      dtype='datetime64[ns]')
Coordinates:
  * time           (time) datetime64[ns] 1979-01-01 1979-01-02 ... 2023-02-05
    sic_threshold  float32 ...
    lat            float32 ...
    lon            float32 ...
Attributes:
    standard_name:          time
    long_name:              time of the observation (centered)
    coverage_content_type:  auxiliaryInformation
    axis:                   T
<xarray.Dataset>
Dimensions:        (time: 2, nv: 2)
Coordinates:
  * time           (time) datetime64[ns] 2023-01-01 2023-01-02
    sic_threshold  float32 ...
    lat            float32 ...
    lon            float32 ...
Dimensions without coordinates: nv
Data variables:
    lat_bounds     (nv) float32 ...
    lon_bounds     (nv) float32 ...
    area           |S64 ...
    sie            (time) float64 ...
    source         (time) float64 ...
Attributes: (12/35)
    title:                   Daily Northern Hemisphere Sea Ice Extent from EU...
    product_id:              OSI-420
    product_name:            OSI SAF Sea Ice Index
    product_status:          demonstration
    version:                 v2p1
    summary:                 Time series of Daily Sea Ice Extent (SIE) for No...
    ...                      ...
    distribution_statement:  Free
    copyright_statement:     Copyright 2023 EUMETSAT
    references:              Product User Manual for OSI-420, Lavergne et al....
    featureType:             timeSeries
    DODS.strlen:             2
    DODS.dimName:            nchar
<xarray.DataArray 'time' (time: 16140)>
array([cftime.DatetimeAllLeap(1979, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeAllLeap(1979, 1, 2, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeAllLeap(1979, 1, 3, 0, 0, 0, 0, has_year_zero=True), ...,
       cftime.DatetimeAllLeap(2023, 2, 3, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeAllLeap(2023, 2, 4, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeAllLeap(2023, 2, 5, 0, 0, 0, 0, has_year_zero=True)],
      dtype=object)
Coordinates:
  * time           (time) object 1979-01-01 00:00:00 ... 2023-02-05 00:00:00
    lat            float32 90.0
    lon            float32 0.0
    sic_threshold  float32 0.15
Attributes:
    standard_name:          time
    long_name:              time of the observation (centered)
    coverage_content_type:  auxiliaryInformation
    axis:                   T
Traceback (most recent call last):
  File "/var/home/michael/minimal.py", line 15, in <module>
    print(ds.sel(time=["2023-01-01", "2023-01-02"]))
  File "/var/home/michael/mambaforge/envs/geoscience/lib/python3.10/site-packages/xarray/core/dataset.py", line 2554, in sel
    query_results = map_index_queries(
  File "/var/home/michael/mambaforge/envs/geoscience/lib/python3.10/site-packages/xarray/core/indexing.py", line 183, in map_index_queries
    results.append(index.sel(labels, **options))  # type: ignore[call-arg]
  File "/var/home/michael/mambaforge/envs/geoscience/lib/python3.10/site-packages/xarray/core/indexes.py", line 480, in sel
    raise KeyError(f"not all values found in index {coord_name!r}")
KeyError: "not all values found in index 'time'"

Anything else we need to know?

No response

Environment

/var/home/michael/mambaforge/envs/geoscience/lib/python3.10/site-packages/_distutils_hack/__init__.py:33: UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.") INSTALLED VERSIONS ------------------ commit: None python: 3.10.9 | packaged by conda-forge | (main, Feb 2 2023, 20:20:04) [GCC 11.3.0] python-bits: 64 OS: Linux OS-release: 6.1.9-200.fc37.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: ('en_GB', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.8.1 xarray: 2022.11.0 pandas: 1.5.1 numpy: 1.23.4 scipy: 1.9.3 netCDF4: 1.6.1 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.6.2 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.6 dask: None distributed: None matplotlib: 3.6.2 cartopy: 0.21.0 seaborn: 0.12.1 numbagg: None fsspec: None cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 65.5.1 pip: 22.3.1 conda: None pytest: None IPython: 8.6.0 sphinx: None
spencerkclark commented 1 year ago

Indeed currently we do not support indexing a CFTimeIndex-backed array using a list of strings, but that's something I think we would be happy to change (e.g. we do accept a list of strings to interp for CFTimeIndex-backed arrays).

For the time being you should be able to use cftime.DatetimeAllLeap values themselves:

ds.sel(time=[cftime.DatetimeAllLeap(2023, 1, 1), cftime.DatetimeAllLeap(2023, 1, 2)])
huaracheguarache commented 1 year ago

Ok, great! Thanks for the tip.

On Mon, Feb 6, 2023 at 13:51, Spencer Clark @.***> wrote:

Indeed currently we do not support indexing a CFTimeIndex-backed array using a list of strings, but that's something I think we would be happy to change (e.g. we do accept a list of strings to interp for CFTimeIndex-backed arrays).

For the time being you should be able to use cftime.DatetimeAllLeap values themselves:

ds

.

sel

(

time

=

[

cftime

.

DatetimeAllLeap

(

2023

,

1

,

1

),

cftime

.

DatetimeAllLeap

(

2023

,

1

,

2

)])

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>