xCDAT / xcdat

An extension of xarray for climate data analysis on structured grids.
https://xcdat.readthedocs.io/en/latest/
Apache License 2.0
101 stars 11 forks source link

[Feature]: More Sophisticated Bounds Handling in Temporal Averaging Operations #594

Open pochedls opened 5 months ago

pochedls commented 5 months ago

Is your feature request related to a problem?

xcdat temporal averaging operations currently bin data by the labelled time point with the weights derived from the difference in the time bounds. This works for most conventional climate data: a timepoint of 2020-01-16 12:00 with bounds of [2020-01-01 00:00, 2020-02-01 00:00] would be given 31 days of weight in January (e.g., in creating an annual average or climatology), which is correct.

There are reasonable instances where this wouldn't work. Imagine pentad data with a time point of 2020-02-02 12:00 with bounds of [2020-01-31 00:00, 2020-02-05 00:00]. This time point should be given one day of weight in January and four days of weight in February. The current algorithm (e.g., for monthly averaging) assigns all five days of weight in February (the labelled time point).

Describe the solution you'd like

Weights should be assigned based on the time period that they fall into. This would mean that a given time point can contribute to averages in more than one time interval

Describe alternatives you've considered

Solutions for the time being would be to update documentation to note that:

Additional context

I'm not sure cdms / cdutil covers this case; it would be helpful to determine what cdutil does.

This seems like it could be challenging issue to address in general and might require a major refactor of the logic use for existing temporal averaging calculations.

tomvothecoder commented 5 months ago

I opened up PR #601 to implement the documentation updates suggested in your alternative solution.