Open durack1 opened 5 months ago
This is essentially the same as #565. PRs welcome.
@durack1 – as a workaround, you can probably use ds.bounds.add_time_bounds()
to create bounds after your temporal averaging operation. If the time point is important, you could also average the resulting time bounds and reassign them to ds.time
.
To be sure. Regarding the statement made above: "The data example is 1 through 31 January 1998 daily files of SST. I would expect for a monthly average over these days [1998-01-01 00:00 ... 1998-01-31 00:00]".
I think the monthly average should be for all data representing intervals partially or fully within the bounds [1998-01-01 00:00 and 1998-02-01 00:00], so if the the coordinates values represent the beginning of daily means (rather than the middle of the day), then indeed data values for coordinates intervals associated with the coordinate values [1998-01-01 00:00 ... 1998-01-31 00:00] will be simply averaged together (no weighting required) to yield a monthly mean extending from 1998-01-01 00:00 to 1998-02-01 00:00. If, however, the coordinates represent the middle of each daily interval, then the first daily sample represents an interval from 1997-12-31 12:00 through 1998-01-01 12:00 and the monthly mean will include 32 daily samples (rather than 31) with the first and last samples included but with half the "weight" of the others. There is further discussion of this at https://github.com/xCDAT/xcdat/issues/565.
This is essentially the same as #565. PRs welcome.
Agreed, this is effectively a dupe although focuses on daily -> monthly, vs monthly -> annual #565. duplicate label assigned
565 also seems to be a duplicate of #534, which is related to #529.
@pochedls it seems in #529 there was an xarray version dependence. Is there a test that would catch issues in the case upstream (xarray) changes impact the temporal_average
function updates?
@durack1 – as a workaround, you can probably use
ds.bounds.add_time_bounds()
to create bounds after your temporal averaging operation. If the time point is important, you could also average the resulting time bounds and reassign them tods.time
.
I had tried this (in the Notebook) but got the error
ValueError: Cannot generate bounds for coordinate variable 'time' which has a length <= 1 (singleton).
The code I attempted was ds.bounds.add_time_bounds(method="freq", freq="month")
not sure if this was some noob error
I think it is generally not possible to automatically generate bounds if there is only one time value, but I think it could be done in this case (you are specifying the frequency). This would require revisiting this block of code.
What happened?
I have been attempting to generate monthly mean from daily data, and expected the process-resolved time value to generate an index that is the middle of the month. The data example is 1 through 31 January 1998 daily files of SST. I would expect for a monthly average over these days
[1998-01-01 00:00 ... 1998-01-31 00:00]
the new time value represents the mean of the period e.g.[1998-01-16 12:00]
, whereas I get[1998-01-01 00:00]
, and no time_bounds assignedWhat did you expect to happen? Are there are possible answers you came across?
I expected a new monthly mean value to have the time axis value
[1998-01-16 12:00]
, which also includes time_bounds [1998-01-01 00:00, 1998-02-01 00:00]
.Working example included in the Jupyter Notebook
Minimal Complete Verifiable Example (MVCE)
Relevant log output
Anything else we need to know?
ping @taylor13, @gleckler1, @lee1043 who I have been discussing this behaviour with
Environment
INSTALLED VERSIONS
commit: None python: 3.10.13 | packaged by conda-forge | (main, Dec 23 2023, 15:36:39) [GCC 12.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-1160.90.1.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.14.3 libnetcdf: 4.9.2
xarray: 2023.12.0 pandas: 2.1.4 numpy: 1.26.3 scipy: 1.11.4 netCDF4: 1.6.5 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.6.3 nc_time_axis: None iris: None bottleneck: 1.3.7 dask: 2023.12.1 distributed: 2023.12.1 matplotlib: 3.8.2 cartopy: 0.22.0 seaborn: None numbagg: None fsspec: 2023.12.2 cupy: None pint: None sparse: 0.15.1 flox: None numpy_groupies: None setuptools: 69.0.3 pip: 23.3.2 conda: None pytest: None mypy: None IPython: 8.20.0 sphinx: None