xCDAT / xcdat

An extension of xarray for climate data analysis on structured grids.
https://xcdat.readthedocs.io/en/latest/
Apache License 2.0
101 stars 11 forks source link

[Bug]: `add_missing_bounds` not working when time type is `numpy.timedelta64` #659

Closed lee1043 closed 3 weeks ago

lee1043 commented 4 weeks ago

What happened?

xarray's tutorial sample data for air temperature has its time axis typed numpy.timedelta64 while numpy.timedelta is expected from xcdat

What did you expect to happen? Are there are possible answers you came across?

The issue can be addressed by converting numpy.timedelta64 to numpy.timedelta if needed when checking the difference between the interval of the second and first time steps.

Minimal Complete Verifiable Example (MVCE)

import xcdat as xc
import xarray as xr

# Load xarray tutorial sample data
ds = xr.tutorial.load_dataset("air_temperature")
data_var = "air"

# Add axis attributes to lat / lon axes, which was missing in the sample data
new_y = ds.lat.assign_attrs(axis='Y')
new_x = ds.lon.assign_attrs(axis='X')
ds = ds.assign_coords(lat=new_y, lon=new_x)

# Add bounds
ds = ds.bounds.add_missing_bounds(["X", "Y", "T"])

Relevant log output

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[7], line 1
----> 1 ds = ds.bounds.add_missing_bounds(["X", "Y", "T"])

File ~/mambaforge/envs/xcdat_dev_20240401/lib/python3.11/site-packages/xcdat/bounds.py:189, in BoundsAccessor.add_missing_bounds(self, axes)
    187         bounds = create_bounds(axis, coord)
    188     elif axis == "T":
--> 189         bounds = self._create_time_bounds(coord)
    190 except (ValueError, TypeError) as e:
    191     logger.warning(e)

File ~/mambaforge/envs/xcdat_dev_20240401/lib/python3.11/site-packages/xcdat/bounds.py:607, in BoundsAccessor._create_time_bounds(self, time, freq, daily_subfreq, end_of_month)
    605 if daily_subfreq is None:
    606     diff = time.values[1] - time.values[0]
--> 607     hrs = diff.seconds / 3600
    608     daily_subfreq = int(24 / hrs)  # type: ignore
    610 time_bnds = self._create_daily_time_bounds(timesteps, obj_type, freq=daily_subfreq)  # type: ignore

AttributeError: 'numpy.timedelta64' object has no attribute 'seconds'

Anything else we need to know?

Narrow down to the issue:

time = ds.time
diff = time.values[1] - time.values[0]
diff.seconds

'numpy.timedelta64' object has no attribute 'seconds'

type(diff)

numpy.timedelta64

Environment

INSTALLED VERSIONS

commit: None python: 3.11.8 | packaged by conda-forge | (main, Feb 16 2024, 20:51:20) [Clang 16.0.6 ] python-bits: 64 OS: Darwin OS-release: 23.5.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: None LOCALE: (None, 'UTF-8') libhdf5: 1.14.3 libnetcdf: 4.9.2

xarray: 2024.3.0 pandas: 2.2.1 numpy: 1.26.4 scipy: 1.12.0 netCDF4: 1.6.5 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.6.3 nc_time_axis: 1.4.1 iris: None bottleneck: None dask: 2024.3.1 distributed: 2024.3.1 matplotlib: 3.8.3 cartopy: None seaborn: None numbagg: None fsspec: 2024.3.1 cupy: None pint: None sparse: 0.15.1 flox: None numpy_groupies: None setuptools: 69.2.0 pip: 24.0 conda: None pytest: 8.1.1 mypy: 1.4.0 IPython: 8.22.2 sphinx: 7.2.6

lee1043 commented 4 weeks ago

This will be resolved by #655, with its commit https://github.com/xCDAT/xcdat/pull/655/commits/27b13bd010bb2be4aeff3f85faf3d3fc59058381.

pochedls commented 4 weeks ago

@lee1043 – what type of objects are in the time axis itself (I am guessing they are not np.timedelta64)?

Note that xcdat's missing bounds functionality expects cftime of datetime64 objects for the time axis (here).

Is it possible to convert the time axis before using add_missing_bounds? Something like this could work:

my_dataset['time'] = X(my_dataset.time) (where X is some pandas/numpy/cftime operation that might convert the time axis)

I think this would be preferable to adding in pandas functionality in the missing bounds routines. If you do add this functionality, it would be ideal to do it in a separate PR.

pochedls commented 4 weeks ago

@lee1043 : does this resolve this issue:

ds = xr.tutorial.load_dataset("air_temperature", use_cftime=True)

lee1043 commented 4 weeks ago

@pochedls Thank you for the thoughts. add_missing_bounds is an example here to show the issue, while the initial intent was to do temporal averaging. type(ds.time.values[0]) shows they are numpy.datetime64.

Good to learn about use_cftime=True, which resolved the issue.

The current bounds.py already has a part to convert object to timedelta at below, so to make it consistent I think it won't be harmful to add it where it was missed (as I did in https://github.com/xCDAT/xcdat/commit/27b13bd010bb2be4aeff3f85faf3d3fc59058381).

https://github.com/xCDAT/xcdat/blob/223869e2a286ae403427e26921c749f60b0d4f6b/xcdat/bounds.py#L916

I however see @tomvothecoder left a comment below the above line

    # FIXME: These lines produces the warning: `PerformanceWarning:
    # Adding/subtracting object-dtype array to TimedeltaArray not
    # vectorized` after converting diffs to `timedelta`. I (Tom) was not
    # able to find an alternative, vectorized solution at the time of this
    # implementation.

I will open a separate PR as suggested.