Aggregations and Grouping (aka drill-down/roll-up with grand/sub-totals)

nicolaskruchten commented 3 years ago

Many similar tables i.e. AG-Grid have support for showing aggregations of rows, rather than raw rows: https://www.ag-grid.com/javascript-grid-grouping/ or Reactable in R-Shiny-land https://glin.github.io/reactable/articles/examples.html#grouping-and-aggregation ... a typical use-case for this is financial reporting, where the columns are months and the rows are departments broken down by type of spending, say.

The way this works, conceptually, is that for a given dataset with columns "country, city, population", instead of displaying all the rows, you would say "summarize population via sum, and group by country" then you end up with one row displayed per country, and a derived column which is the sum of population. We could include more than one column to be aggregated (i.e. add GDP) and other aggregation functions (e.g. mean, median etc) and even complex totally-derived "measure" columns that don't map 1-1 with input columns, and which are the result of applying functions on multiple existing columns (e.g. sum(col A)/sum(col B)). The idea of a "summary column" to do row-wise aggregation fits into this also: you might easily imagine wanting to have one column on the far right that represents the sum of all the other columns, if all the other columns are sums of their rows.

Once a table knows how to do the above, you could allow it to "drill down" via a "drill-down path" (aka hierarchy or tree-like view) like [country, city] such that clicking on a country would cause per-city rows to appear underneath and slightly indented. The drill-down path could be deeper than 2 levels, of course. Given the ability to drill down, we can also imagine the ability to show expanded groups on initial render, and allow the ability to "roll up" on demand.

This implicitly defines "row groups" i.e. all the cities are grouped by country. When a group is expanded, we could bring in the notion of a "subtotal row" which would do the same thing as a total row/footer on a per-group basis. The notion of a table-wide footer is a specialization of this, essentially (see #433, #627).

For completeness, the same kind of thing might be wanted column-wise i.e. column-groups, although in my experience this is rarer.

So the API here basically looks something like:

some "derived column" API that allows for on-the-fly computation of columns based on aggregations of other columns... this can be done in a backwards-compatible way by saying that every column today is "derived from itself" via the "first" aggregator ;)
some way of passing in certain columns that are not displayed i.e. they are passed in solely to serve as the basis for other derived columns
some "drill down path" prop to define the drilldown/rollup, and some special column that indicates "this is the drill-down path control column"
some set of props to represent the current state of the drilldown/rollup (i.e. which groups are expanded/collapsed)
some set of props to control grand/sub-totals

How would this intersect with:

client vs server computation? lots of options here: client-only, server-only, isomorphic
filtering? presumably the filters apply the underlying rows and the aggregations apply to the filtered set... might want some way of indicating that the aggregation doesn't apply to all the possible rows i.e. via italics or something
editable columns? presumably you can only edit underlying rows and not their aggregations
selections? selecting a group selects all the underlying rows?
formatting? special formatting directives per drill-down level, or just leaf-vs-non-leaf
exporting? some specific set of rules for how to format an export

keywords for search engines and searching in other places like the forum: hierarchy, tree, groups, drill-down, roll-up, aggregate, total, subtotal, grand total, expand/collapse, details/summary

nicolaskruchten commented 3 years ago

Related forum posts:

nicolaskruchten commented 3 years ago

My original PivotTable.js doesn't support this out of the box but it does supports pluggable renderers and someone made a plugin for it called Subtotal.js which does both row- and column-wise aggregation with rollup/drilldown etc.

My React port of Pivottable.js, react-pivottable, which was recently wrapped up into dash-pivottable, also doesn't support this stuff out of the box but supports the same kind of pluggable renderer (even easier to use actually) and so something like react-subtotal might be easy-ish to plug in. People on the forum are already asking about subtotals, naturally: https://community.plotly.com/t/dash-pivottable-released/43333/5

I should note that today, react-pivottable only supports a single measure (called "aggregators" internally) so you can't display sum(population) and sum(gdp) on the same table, which is a pretty serious limitation.

Marc-Andre-Rivet commented 3 years ago

Not getting into the finer points of the interaction with all the table props and focusing on the two main props at hand: data and columns. Trying to think of what the interaction would look like from both a native and custom (serverside) implementations.

We now have drilldowns, which requires us to keep track of the expansion state of the nodes
Considering roll-ups implicitly covered by the transformations below

Native

f: data, group_by, aggregations -> tree
f: tree, ui_expansions -> data*

f: columns, hidden_columns, group_by -> columns*

tree: n-ary tree with each node essentially a (datum, descendants)
ui_expansion: n-ary sparse tree matching the depth and width of nodes if the tree
Leaves: each datum* in data* that corresponds to an original datum from data is essentially the same as the original datum but contains additional information for level and is_leaf
Internal nodes: Correspond to one step of the group-by and contains a datum* generated from the aggregation rules
Columns: Grouping/Aggregations might reorder/add/remove columns

Caveats

data* can now be used to update the derived_**_data props -- do we match presentation or underlying structure) -- e.g. are collapsed nodes excluded (would assume so)
any calculated fields like level and is_leaf would either need to be scrubbed or ignored if present in callback return for data

Custom

If custom, one would have to assume that data is already data* and as such

f: data* -> tree
f: tree, ui_expansions -> data*

Not sure how well this aligns with, say, pandas functionality. Do you lose the raw data during the groupby process?

Some finer points

active_cell alignment on expand/collapse
selected_cells alignment and fragmentation on expand/collapse
active_cell and selected_cells on data edit and on data callback (or any other callbacks affecting calculation of data*)
grouping: on raw data or presentation?
aggregations: on raw data or sanitized data? (e.g. null -> 0)
formatting: aggregated values may not match underlying data type
paging: what does paging mean when your data can be expanded/collapsed? -- maybe the only valid value is paging: 'none'
sorting: sort per-level or top-level only?
filtering: match any nested to display?
sorting / filtering: might we end up operating on the tree itself?

f: tree, sort, filter -> tree

plotly / dash-table

Aggregations and Grouping (aka drill-down/roll-up with grand/sub-totals) #825