uwdata / arquero

Query processing and transformation of array-backed data tables.
https://idl.uw.edu/arquero
BSD 3-Clause "New" or "Revised" License
1.22k stars 64 forks source link

COUNTIF-like aggregate function #336

Closed TomFevrier closed 10 months ago

TomFevrier commented 10 months ago

Hi! I am having trouble aggregating data based on count filtered by a predicate (similar to how COUNTIF works in spreadsheets).

I tried using array_agg like this:

df.groupby(['station_id', 'month'])
  .rollup({
     nb_days_25: op.array_agg('tg').filter((v) => v >= 25).length
  })

but it doesn't work this way.

Then, I tried creating a custom countif aggregate function that takes the predicate as a parameter, like this:

const countIf = {
  create: (predicate) => ({
    init: (state) => state.values = true,
    add: () => {},
    rem: () => {},
    value: (state) => state.list.values().filter(predicate).length
  }),
  param: [1, 1]
}

addAggregateFunction('countif', countIf);

// (...)

df.groupby(['station_id', 'month'])
  .rollup({
     nb_days_25: op.countif('tg', (v) => v >= 25)
  })

but it doesn't work neither, as functions passed as parameters are ignored (undefined is not a function).

For now, the only solution I found was using a very specific aggregate function like:

const countGreaterThan = {
  create: (threshold) => ({
    init: (state) => state.values = true,
    add: () => {},
    rem: () => {},
    value: (state) => state.list.values().filter((v) => v >= threshold).length
  }),
  param: [1, 1]
}

addAggregateFunction('count_gt', countGreaterThan);

// (...)

df.groupby(['station_id', 'month'])
  .rollup({
     nb_days_25: op.count_gt('tg', 25)
  })

Is there any other way to do this? Am I using array_agg or custom aggregate functions wrong? I am still new to Arquero.

Thank you very much for your help!