xCDAT / xcdat

An extension of xarray for climate data analysis on structured grids.
https://xcdat.readthedocs.io/en/latest/
Apache License 2.0
113 stars 12 forks source link

BUG: Unable to reproduce the "Calculating Climatology and Departures from Time Series Data" Example #620

Closed mgrover1 closed 6 months ago

mgrover1 commented 6 months ago

What happened?

I tried to run through and reproduce the "Calculating Climatology and Departures from Time Series Data" example included in the documentation, but ran into a bounds error.

What did you expect to happen? Are there are possible answers you came across?

I expected to be able to full execute the "Calculating Climatology and Departures from Time Series Data" example, specifically the "Daily Climatology" portion.

Minimal Complete Verifiable Example (MVCE)

import xcdat

filepath2 = "http://esgf.nci.org.au/thredds/dodsC/master/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/historical/r10i1p1f1/3hr/tas/gn/v20200605/tas_3hr_ACCESS-ESM1-5_historical_r10i1p1f1_gn_201001010300-201501010000.nc"
ds_hourly = xcdat.open_dataset(filepath2, chunks={"time": "auto"})

# Unit adjust (-273.15, K to C)
ds_hourly["tas"] = ds_hourly.tas - 273.15
ds_hourly

daily_climo = ds_hourly.temporal.climatology("tas", freq="day", weighted=True)

Relevant log output

KeyError                                  Traceback (most recent call last)
Cell In[14], line 1
----> 1 daily_climo = ds_hourly.temporal.climatology("tas", freq="day", weighted=True)

File ~/mambaforge/envs/xcdat-dev/lib/python3.12/site-packages/xcdat/temporal.py:512, in TemporalAccessor.climatology(self, data_var, freq, weighted, keep_weights, reference_period, season_config)
    371 """Returns a Dataset with the climatology of a data variable.
    372 
    373 Time bounds are used for generating weights to calculate weighted
   (...)
    508 }
    509 """
    510 self._set_data_var_attrs(data_var)
--> 512 return self._averager(
    513     data_var,
    514     "climatology",
    515     freq,
    516     weighted,
    517     keep_weights,
    518     reference_period,
    519     season_config,
    520 )

File ~/mambaforge/envs/xcdat-dev/lib/python3.12/site-packages/xcdat/temporal.py:762, in TemporalAccessor._averager(self, data_var, mode, freq, weighted, keep_weights, reference_period, season_config)
    760 # Get the data variable and the required time axis metadata.
    761 dv = _get_data_var(ds, data_var)
--> 762 time_bounds = ds.bounds.get_bounds("T", var_key=dv.name)
    764 if self._mode == "average":
    765     dv = self._average(dv, time_bounds)

File ~/mambaforge/envs/xcdat-dev/lib/python3.12/site-packages/xcdat/bounds.py:247, in BoundsAccessor.get_bounds(self, axis, var_key)
    244         bounds_keys = []
    246 if len(bounds_keys) == 0:
--> 247     raise KeyError(
    248         f"No bounds data variables were found for the '{axis}' axis. Make sure "
    249         "the dataset has bound data vars and their names match the 'bounds' "
    250         "attributes found on their related time coordinate variables. "
    251         "Alternatively, you can add bounds with `ds.bounds.add_missing_bounds()` "
    252         "or `ds.bounds.add_bounds()`."
    253     )
    255 bounds: Union[xr.Dataset, xr.DataArray] = self._dataset[
    256     bounds_keys if len(bounds_keys) > 1 else bounds_keys[0]
    257 ].copy()
    259 return bounds

KeyError: "No bounds data variables were found for the 'T' axis. Make sure the dataset has bound data vars and their names match the 'bounds' attributes found on their related time coordinate variables. Alternatively, you can add bounds with `ds.bounds.add_missing_bounds()` or `ds.bounds.add_bounds()`."

Anything else we need to know?

This is related to the JORS review

Environment

INSTALLED VERSIONS

commit: None python: 3.12.2 | packaged by conda-forge | (main, Feb 16 2024, 20:54:21) [Clang 16.0.6 ] python-bits: 64 OS: Darwin OS-release: 23.4.0 machine: arm64 processor: arm byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.14.3 libnetcdf: 4.9.2

xarray: 2024.2.0 pandas: 2.2.1 numpy: 1.26.4 scipy: 1.12.0 netCDF4: 1.6.5 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.6.3 nc_time_axis: None iris: None bottleneck: None dask: 2024.3.0 distributed: 2024.3.0 matplotlib: 3.8.3 cartopy: None seaborn: None numbagg: None fsspec: 2024.3.0 cupy: None pint: None sparse: 0.15.1 flox: None numpy_groupies: None setuptools: 69.2.0 pip: 24.0 conda: None pytest: None mypy: None IPython: 8.22.2 sphinx: None

pochedls commented 6 months ago

Thank you for your review! We will fix the documentation here (and ensure docs are up-to-date in general). The daily dataset does not have bounds, which can be added on open (or after open with ds_hourly = ds_hourly.bounds.add_missing_bounds(["T"])):

ds_hourly = xcdat.open_dataset(filepath2, chunks={"time": "auto"}, add_bounds=["T"])

xCDAT used to add the bounds automatically, but we decided that a user should make this decision.

tomvothecoder commented 6 months ago

This is now resolved in #623

tomvothecoder commented 6 months ago

I also checked if this issue was present in the other temporal averaging notebook, but it only uses the monthly file that already has bounds.