pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.64k stars 1.09k forks source link

Calling `.isel()` on a timezone-aware dimension/index causes it to lose timezone information #9307

Open JamieTaylor-TUOS opened 4 months ago

JamieTaylor-TUOS commented 4 months ago

What happened?

With a Dataset/DataArray containing a time dimension whose index uses the datetime64[ns, utc] (timezone-aware) dtype, if one calls .isel() to slice say the first element in this dimension, the time coordinate in the resulting Dataset/DataArray will have reverted to datetime64[ns] (i.e. timezone-naive).

What did you expect to happen?

Resulting Dataset/DataArray should retain the timezone-awareness on the coordinate of the sliced time dimension/index and still use datetime64[ns, utc] dtype

Minimal Complete Verifiable Example

import numpy as np
import pandas as pd
import xarray as xr

mydata = xr.DataArray(
    data=np.array([
        [0, 1, 2, 3],
        [4, 5, 6, 7],
        [8, 9, 10, 11]
    ]),
    coords={
        "category": ["A", "B", "C"],
        "time": pd.to_datetime([
            "2024-08-02T11:00:00+00:00",
            "2024-08-02T12:00:00+00:00",
            "2024-08-02T13:00:00+00:00",
            "2024-08-02T14:00:00+00:00"
        ])
    },
    name="volume"
)
print(mydata)
print("---------------------------")
print(f"time index dtype before calling `.isel()`: {mydata.indexes['time'].dtype}")
print(f"time coord dtype before calling `.isel()`: {mydata.coords['time'].dtype}")
print("---------------------------")
# The following will slice the zeroth index in the time dimension - the time index will cease to exist but the corresponding coordinate will remain
subset = mydata.isel(time=0, drop=False)
print("---------------------------")
print(subset)
print("---------------------------")
print(f"time coord dtype after  calling `.isel()`: {subset.coords['time'].dtype}")

MVCE confirmation

Relevant log output

<xarray.DataArray 'volume' (category: 3, time: 4)> Size: 96B
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
Coordinates:
  * category  (category) <U1 12B 'A' 'B' 'C'
  * time      (time) object 32B 1722596400000000000 ... 1722607200000000000
---------------------------
time index dtype before calling `.isel()`: datetime64[ns, UTC]
time coord dtype before calling `.isel()`: object
---------------------------
---------------------------
<xarray.DataArray 'volume' (category: 3)> Size: 24B
array([0, 4, 8])
Coordinates:
  * category  (category) <U1 12B 'A' 'B' 'C'
    time      datetime64[ns] 8B 2024-08-02T11:00:00
---------------------------
time coord dtype after  calling `.isel()`: datetime64[ns]

Anything else we need to know?

Tested with version 2024.3.0 and also 2024.7.0.

Similar to #6416

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.11.0 (main, Mar 1 2023, 18:26:19) [GCC 11.2.0] python-bits: 64 OS: Linux OS-release: 5.15.0-1064-azure machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: None libnetcdf: None xarray: 2024.7.0 pandas: 2.2.2 numpy: 2.0.1 scipy: None netCDF4: None pydap: None h5netcdf: None h5py: None zarr: None cftime: None nc_time_axis: None iris: None bottleneck: None dask: None distributed: None matplotlib: None cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 69.5.1 pip: 24.0 conda: None pytest: None mypy: None IPython: 8.26.0 sphinx: None
welcome[bot] commented 4 months ago

Thanks for opening your first issue here at xarray! Be sure to follow the issue template! If you have an idea for a solution, we would really welcome a Pull Request with proposed changes. See the Contributing Guide for more. It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better. Thank you!

max-sixty commented 4 months ago

Thanks for the excellent issue @JamieTaylor-TUOS

(I labeled this as "topic-cftime" as I don't think we have a "time but not necessarily cftime" label; tell me if this is not what we're intending, xarray team)