xarray-contrib / flox

Fast & furious GroupBy operations for dask.array
https://flox.readthedocs.io
Apache License 2.0
124 stars 18 forks source link

Use dot product for containment #306

Closed dcherian closed 10 months ago

dcherian commented 10 months ago

Closes #304

A regression for one test case, I think because I am currently re-iterating over all labels instead of over known cohorts, and then merging those.

| Before [ce89b381] <main>   | After [4c1fb7aa] <dotproduct>   |   Ratio | Benchmark (Parameter)                                  |
|----------------------------|---------------------------------|---------|--------------------------------------------------------|
| 2.37±0.01ms                | 4.03±0.02ms                     |    1.7  | cohorts.ERA5MonthHour.time_find_group_cohorts          |
| 2.68±0.05ms                | 4.32±0.02ms                     |    1.61 | cohorts.ERA5MonthHourRechunked.time_find_group_cohorts |
| 8.30±0.06ms                | 12.6±0.04ms                     |    1.52 | cohorts.ERA5MonthHourRechunked.time_graph_construct    |
| 7.99±0.02ms                | 12.0±0.07ms                     |    1.51 | cohorts.ERA5MonthHour.time_graph_construct             |
| 2.35±0.01ms                | 2.84±0.01ms                     |    1.21 | cohorts.PerfectMonthly.time_graph_construct            |
| 17.6±0.1ms                 | 19.6±0.1ms                      |    1.11 | cohorts.ERA5Google.time_graph_construct                |
| 2.89±0.01ms                | 2.62±0ms                        |    0.91 | cohorts.ERA5Google.time_find_group_cohorts             |
| 723±2μs                    | 625±4μs                         |    0.86 | cohorts.PerfectMonthly.time_find_group_cohorts         |
| 40.0±0.2ms                 | 15.4±0.05ms                     |    0.38 | cohorts.NWMMidwest.time_find_group_cohorts             |
| 11.4±0.06ms                | 3.10±0.01ms                     |    0.27 | cohorts.ERA5DayOfYear.time_find_group_cohorts          |