Implement `Serie.group_by` and aggregate functions

s-leroux commented 3 months ago

Implement Serie.group_by and aggregate functions.

Note that the group_by function has a different semantic from SQL: here, rows are grouped by a consecutive sequence of values matching a condition.

Required by #27.

Possible syntax:

serie.group_by(<cond>, <aggregate fct, ...>)

Where <cond> is an arbitrary column expression (possibly a single column name) and <aggregate fct, ...> is a non-empty sequence of aggregate functions.

Example:

serie.group_by(
    (fc.gt, "CLOSE", "OPEN"),
    (ag.first, "NAME", "ID", "DATE"),
    (ag.count, (fc.named("COUNT"), "NAME")),
    (ag.avg, (fc.named("AVG PRICE"), "CLOSE")),
)

s-leroux commented 3 months ago

There is no obvious way to specify the aggregate function for the index.

See #32

s-leroux commented 3 months ago

There is no obvious way to specify the aggregate function for the index.

At this point, I consider making the index definition mandatory in the group_by clause:

serie.group_by(
  "MONTH",
  (ag.first, "DATE"), # <-- the first entry will implicitly become the aggregate index
  (ag.avg, fc.named("AVG PRICE"), "PRICE")
)

Regarding the sort_by clause, a conservative approach would be to allow only column names as specifiers and not full-fledged column expressions. Maybe we may even reduce that to only one column specifier. For complex sort operations this would require synthesizing the required column in a preceding select clause:

serie.select(
  (fc.named("MD"), fc.concat, "MONTH", "DATE"),
).sort_by(
  "MD",
)

s-leroux commented 3 months ago

Implemented in c3b851d0e2ba340f33a2fb96aa8cd856dc1fd4e6.

s-leroux / fin

Implement `Serie.group_by` and aggregate functions #30