plotly / dash-table

OBSOLETE: now part of https://github.com/plotly/dash
https://dash.plotly.com
MIT License
419 stars 74 forks source link

Aggregations and Grouping (aka drill-down/roll-up with grand/sub-totals) #825

Open nicolaskruchten opened 3 years ago

nicolaskruchten commented 3 years ago

Many similar tables i.e. AG-Grid have support for showing aggregations of rows, rather than raw rows: https://www.ag-grid.com/javascript-grid-grouping/ or Reactable in R-Shiny-land https://glin.github.io/reactable/articles/examples.html#grouping-and-aggregation ... a typical use-case for this is financial reporting, where the columns are months and the rows are departments broken down by type of spending, say.

The way this works, conceptually, is that for a given dataset with columns "country, city, population", instead of displaying all the rows, you would say "summarize population via sum, and group by country" then you end up with one row displayed per country, and a derived column which is the sum of population. We could include more than one column to be aggregated (i.e. add GDP) and other aggregation functions (e.g. mean, median etc) and even complex totally-derived "measure" columns that don't map 1-1 with input columns, and which are the result of applying functions on multiple existing columns (e.g. sum(col A)/sum(col B)). The idea of a "summary column" to do row-wise aggregation fits into this also: you might easily imagine wanting to have one column on the far right that represents the sum of all the other columns, if all the other columns are sums of their rows.

Once a table knows how to do the above, you could allow it to "drill down" via a "drill-down path" (aka hierarchy or tree-like view) like [country, city] such that clicking on a country would cause per-city rows to appear underneath and slightly indented. The drill-down path could be deeper than 2 levels, of course. Given the ability to drill down, we can also imagine the ability to show expanded groups on initial render, and allow the ability to "roll up" on demand.

This implicitly defines "row groups" i.e. all the cities are grouped by country. When a group is expanded, we could bring in the notion of a "subtotal row" which would do the same thing as a total row/footer on a per-group basis. The notion of a table-wide footer is a specialization of this, essentially (see #433, #627).

For completeness, the same kind of thing might be wanted column-wise i.e. column-groups, although in my experience this is rarer.

So the API here basically looks something like:

How would this intersect with:

keywords for search engines and searching in other places like the forum: hierarchy, tree, groups, drill-down, roll-up, aggregate, total, subtotal, grand total, expand/collapse, details/summary

nicolaskruchten commented 3 years ago

Related forum posts:

nicolaskruchten commented 3 years ago

My original PivotTable.js doesn't support this out of the box but it does supports pluggable renderers and someone made a plugin for it called Subtotal.js which does both row- and column-wise aggregation with rollup/drilldown etc.

My React port of Pivottable.js, react-pivottable, which was recently wrapped up into dash-pivottable, also doesn't support this stuff out of the box but supports the same kind of pluggable renderer (even easier to use actually) and so something like react-subtotal might be easy-ish to plug in. People on the forum are already asking about subtotals, naturally: https://community.plotly.com/t/dash-pivottable-released/43333/5

I should note that today, react-pivottable only supports a single measure (called "aggregators" internally) so you can't display sum(population) and sum(gdp) on the same table, which is a pretty serious limitation.

Marc-Andre-Rivet commented 3 years ago

Not getting into the finer points of the interaction with all the table props and focusing on the two main props at hand: data and columns. Trying to think of what the interaction would look like from both a native and custom (serverside) implementations.

Native

f: data, group_by, aggregations -> tree
f: tree, ui_expansions -> data*

f: columns, hidden_columns, group_by -> columns*

Caveats

Custom

If custom, one would have to assume that data is already data* and as such

f: data* -> tree
f: tree, ui_expansions -> data*

Not sure how well this aligns with, say, pandas functionality. Do you lose the raw data during the groupby process?

Some finer points

f: tree, sort, filter -> tree