xCDAT / xcdat

An extension of xarray for climate data analysis on structured grids.
https://xcdat.readthedocs.io/en/latest/
Apache License 2.0
101 stars 11 forks source link

[Bug]: temporal.group_average with custom seasons produces wrong result when season cross year boundary for custom seasons. #642

Open oliviermarti opened 2 months ago

oliviermarti commented 2 months ago

What happened?

For a season like ["Dec", "Jan", "Feb"], xcdat use Dec value of the year, not of the previous year, at it should for custom seasons

What did you expect to happen? Are there are possible answers you came across?

No response

Minimal Complete Verifiable Example (MVCE)

import numpy as np, xarray as xr, cftime, xcdat as xc

# Creates a monthlhy time axis
nyear = 3
time = []

for ny in np.arange (nyear) :
    for nm in np.arange (1,13) :
        time.append ( cftime.datetime (year=1900+ny , month=nm , day=15, hour=0, minute=0, second=0, calendar='gregorian', has_year_zero=False) )

time = cftime.date2num (time, units="seconds since 1900-01-01-31 00:00:00.000000", calendar='gregorian', has_year_zero=False, longdouble=False)
time = xr.DataArray ( time, dims=('time',), coords=(time,) )
time.attrs.update ( {
        'axis'         : "T",
        'standard_name': "time",
        'long_name'    : "Time axis",
        'time_origin'  : "1900-01-01 00:00:00",
        'units'        : "seconds since 1900-01-01 00:00:00.000000",
        'calendar'     : "gregorian" })

# Creates a simple variable

#Var = (np.arange (len(time))%12 + 1).astype(float)
Var = (np.arange (len(time)) + 1).astype(float)
Var = xr.DataArray (Var, dims=('time',), coords=(time,)  )
dd  = xr.Dataset ( {'Var':Var})# 'time_bnds':time_bnds} )

dd.to_netcdf ( 'toto.nc', mode="a" )

dc = xc.open_dataset ('toto.nc', use_cftime=True, decode_times=True).bounds.add_missing_bounds()

# 'Classical' three months seasonal values are corrects
result_1 = dc.temporal.group_average ( "Var", "season", season_config={"dec_mode": "DJF", "drop_incomplete_djf":False }).Var

# Using custom season : values crossing year frontiers are wrong
custom_seasons = [["Dec", "Jan", "Feb"], ["Mar", "Apr", "May"], ["Jun", "Jul", "Aug"], ["Sep", "Oct", "Nov"]]
#custom_seasons = [["Dec", "Jan", "Feb", "Mar"], ["Apr", "May", "Jun", "Jul"], ["Aug", "Sep", "Oct", "Nov"]]
#custom_seasons = [["Jun", "Jul", "Aug", "Sep"], ["Oct", "Nov", "Dec", "Jan"], ["Feb", "Mar", "Apr", "May"]]

result_2 = dc.temporal.group_average ( "Var", "season", season_config={"custom_seasons":custom_seasons, "drop_incomplete_djf":False} ).Var

print ( result_1.values )
print ( result_2.values )

Relevant log output

Both computation should giuve the asme result, but we get :

[ 1.47457627  4.          7.01086957 10.         12.96666667 16.
 19.01086957 22.         24.96666667 28.         31.01086957 34.
 36.        ]
[ 5.1         4.          7.01086957 10.         17.1        16.
 19.01086957 22.         29.1        28.         31.01086957 34.        ]

Anything else we need to know?

No response

Environment

xr.show_versions() /Users/marti/mambaforge/envs/FULL/lib/python3.11/site-packages/_distutils_hack/init.py:26: UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.")

INSTALLED VERSIONS

commit: None python: 3.11.8 | packaged by conda-forge | (main, Feb 16 2024, 20:51:20) [Clang 16.0.6 ] python-bits: 64 OS: Darwin OS-release: 23.4.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.14.3 libnetcdf: 4.9.2

xarray: 2024.3.0 pandas: 2.2.2 numpy: 1.26.4 scipy: 1.13.0 netCDF4: 1.6.5 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.6.3 nc_time_axis: 1.4.1 iris: 3.8.1 bottleneck: 1.3.8 dask: 2024.4.1 distributed: 2024.4.1 matplotlib: 3.8.3 cartopy: 0.22.0 seaborn: 0.13.2 numbagg: None fsspec: 2024.2.0 cupy: None pint: 0.23 sparse: 0.15.1 flox: None numpy_groupies: None setuptools: 69.1.1 pip: 24.0 conda: None pytest: 8.0.2 mypy: None IPython: 8.22.2 sphinx: 7.2.6

tomvothecoder commented 2 months ago

Hey @oliviermarti thanks for opening this GitHub issue.

xCDAT currently does not support custom seasons spanning the calendar year. Another user opened up GitHub Issue #416, which I believe is the same thing as this GitHub issue.

PR #423 is intended to expand the capabilities of custom seasons, including:

  1. Adding support for seasons that span calendar years
  2. Detecting and dropping incomplete seasons (not just DJF)
  3. Removing the requirement for all 12 months to be used for custom seasons