xarray-contrib / flox

Fast & furious GroupBy operations for dask.array
https://flox.readthedocs.io
Apache License 2.0
123 stars 16 forks source link

More Groupers / user stories / strategies #255

Open dcherian opened 1 year ago

dcherian commented 1 year ago

We need ~more discussion of strategies to label groups or perhaps just~ more convenient Grouper objects

dcherian commented 1 year ago

Potential SeasonGrouper syntax

SeasonGrouper(["JF", "MAM", "JJAS", "OND"])
SeasonGrouper(["DJFM", "MAMJ", "JJAS", "SOND"])

The list would be used as expected_groups so the output is in the right order.

Here's an interesting thread on SeasonGrouper design: https://github.com/xCDAT/xcdat/issues/416

ds.temporal.group_average(
    'pr',
    freq='season',
    season_config={
        'dec_mode': 'DJF',  
        'drop_incomplete_djf': True, # Or drop_incomplete_season
        'custom_seasons': ['Nov', 'Dec', 'Jan', 'Feb', 'Mar']
    }    
)

All of this is a factorizing problem, I like the idea of custom Grouper objects with custom factorization

dcherian commented 10 months ago

Some prior art: from polars

  1. https://pola-rs.github.io/polars/py-polars/html/reference/dataframe/api/polars.DataFrame.group_by_dynamic.html

    Group based on a time value (or index value of type Int32, Int64).

    Time windows are calculated and rows are assigned to windows. Different from a normal group by is that a row can be member of multiple groups. By default, the windows look like:

    [start, start + period)
    
    [start + every, start + every + period)
    
    [start + 2*every, start + 2*every + period)
    
    …

    where start is determined by start_by, offset, and every (see parameter descriptions below).

  2. https://pola-rs.github.io/polars/py-polars/html/reference/dataframe/api/polars.DataFrame.rolling.html#polars.DataFrame.rolling (I don't think this is normal rolling?)