Closed dcherian closed 3 months ago
Ah found an independent change that reduces number of tiny cohorts:
| Before [8e522b5a] <main> | After [c9aacf55] | Ratio | Benchmark (Parameter) |
|--------------------------+------------------+-------+--------------------------------------------------------|
| 6.52±0.1ms | 5.88±0.2ms | 0.9 | cohorts.ERA5MonthHourRechunked.time_graph_construct |
| 208±5μs | 183±4μs | 0.88 | cohorts.ERA5Google.time_find_group_cohorts |
| 6.55±0.04ms | 5.74±0.09ms | 0.88 | cohorts.OISST.time_find_group_cohorts |
| 8.39±0.3ms | 7.11±0.06ms | 0.85 | cohorts.RandomBigArray.time_graph_construct |
| 5.94±0.06ms | 4.90±0.02ms | 0.82 | cohorts.RandomBigArray.time_find_group_cohorts |
| 3.74±0.03ms | 2.97±0.2ms | 0.79 | cohorts.ERA5MonthHourRechunked.time_find_group_cohorts |
| 3.52±0.05ms | 2.69±0.01ms | 0.76 | cohorts.ERA5MonthHour.time_find_group_cohorts |
| 18798 | 12258 | 0.65 | cohorts.OISST.track_num_tasks |
| 16858 | 10571 | 0.63 | cohorts.OISST.track_num_tasks_optimized |
| 146±0.8ms | 78.5±2ms | 0.54 | cohorts.OISST.time_graph_construct |
| 243 | 117 | 0.48 | cohorts.OISST.track_num_cohorts |
| 978 | 474 | 0.48 | cohorts.OISST.track_num_layers |
Do this by skipping perfect cohorts that we already know about.
For the watershed problem, the hotspot is now the sparse matrix multiply to calculate containment. Down to 1.18s from 7s!
Unsure why the number of cohorts has gone up for the OISST benchmark
Other benchmarks don't capture the improvement.