Open tomvothecoder opened 2 years ago
I saw the ping at https://github.com/pydata/xarray/issues/6610. Let me know if you run in to issues or have questions
Thanks @dcherian! I'm looking forward to trying out flox
.
xarray >= 2024.09.0
now supports grouping by multiple variables: https://xarray.dev/blog/multiple-groupers
Is your feature request related to a problem?
Currently, grouping by multiple coordinates (e.g.,
time.year
andtime.season
) requires creating a new set of coordinates before grouping due to the xarray limitations described below.Related code in
xcdat
for temporal grouping: https://github.com/xCDAT/xcdat/blob/c9bcbcdb66af916958a79a33177bc43d478e4036/xcdat/temporal.py#L1266-L1322Current temporal averaging logic (workaround for multi-variable grouping):
xarray.DataArray
to apandas.DataFrame
, a. Keep only the DataFrame columns needed for grouping (e.g., "year" and "season" for seasonal group averages), essentially "labeling" coordinates with their groups b. Process the DataFrame including:cftime
coordinates (season strings aren't supported incftime
/datetime
objects)cftime
objects to represent new time coordinatesDescribe the solution you'd like
It is would be simpler, cleaner, and probably more performant to call something like
.groupby(["time.year", "time.season"])
instead (waiting onxarray
to support this withflox
). This solution will reduce a lot of the internal complexities involved with the temporal averaging API.We might able to achieve this using
flox
directly:Additionally, would need to figure out a way to easily perform the processing steps for time coordinates directly in xarray objects described in 2b if we move away from using
pandas.DataFrame
.Describe alternatives you've considered
Multi-variable grouping was originally done using
pd.MultiIndex
but we shifted away from this approach because this object cannot be written out tonetcdf4
. Alsopd.MultiIndex
is not the standard object type for representing time coordinates in xarray. The standard object types arenp.datetime64
andcftime
.Additional context
Future solution through
xarray
+flox
:xarray
version in https://github.com/pydata/xarray/issues/6610, we should be able to do this..groupby()
performance significantly.