Open nicolaskruchten opened 3 years ago
Related forum posts:
My original PivotTable.js doesn't support this out of the box but it does supports pluggable renderers and someone made a plugin for it called Subtotal.js which does both row- and column-wise aggregation with rollup/drilldown etc.
My React port of Pivottable.js, react-pivottable, which was recently wrapped up into dash-pivottable, also doesn't support this stuff out of the box but supports the same kind of pluggable renderer (even easier to use actually) and so something like react-subtotal might be easy-ish to plug in. People on the forum are already asking about subtotals, naturally: https://community.plotly.com/t/dash-pivottable-released/43333/5
I should note that today, react-pivottable only supports a single measure (called "aggregators" internally) so you can't display sum(population) and sum(gdp) on the same table, which is a pretty serious limitation.
Not getting into the finer points of the interaction with all the table props and focusing on the two main props at hand: data
and columns
. Trying to think of what the interaction would look like from both a native
and custom
(serverside) implementations.
Native
f: data, group_by, aggregations -> tree
f: tree, ui_expansions -> data*
f: columns, hidden_columns, group_by -> columns*
tree: n-ary tree with each node essentially a (datum, descendants)
ui_expansion: n-ary sparse tree matching the depth and width of nodes if the tree
Leaves: each datum*
in data*
that corresponds to an original datum
from data
is essentially the same as the original datum
but contains additional information for level
and is_leaf
Internal nodes: Correspond to one step of the group-by and contains a datum*
generated from the aggregation rules
Columns: Grouping/Aggregations might reorder/add/remove columns
Caveats
data*
can now be used to update the derived_**_data
props -- do we match presentation or underlying structure) -- e.g. are collapsed nodes excluded (would assume so)level
and is_leaf
would either need to be scrubbed or ignored if present in callback return for data
Custom
If custom, one would have to assume that data
is already data*
and as such
f: data* -> tree
f: tree, ui_expansions -> data*
Not sure how well this aligns with, say, pandas
functionality. Do you lose the raw data during the groupby process?
Some finer points
active_cell
alignment on expand/collapse
selected_cells
alignment and fragmentation on expand/collapse
active_cell
and selected_cells
on data edit and on data callback (or any other callbacks affecting calculation of data*
)
grouping: on raw data or presentation?
aggregations: on raw data or sanitized data? (e.g. null -> 0)
formatting: aggregated values may not match underlying data type
paging: what does paging mean when your data can be expanded/collapsed? -- maybe the only valid value is paging: 'none'
sorting: sort per-level or top-level only?
filtering: match any nested to display?
sorting / filtering: might we end up operating on the tree itself?
f: tree, sort, filter -> tree
Many similar tables i.e. AG-Grid have support for showing aggregations of rows, rather than raw rows: https://www.ag-grid.com/javascript-grid-grouping/ or Reactable in R-Shiny-land https://glin.github.io/reactable/articles/examples.html#grouping-and-aggregation ... a typical use-case for this is financial reporting, where the columns are months and the rows are departments broken down by type of spending, say.
The way this works, conceptually, is that for a given dataset with columns "country, city, population", instead of displaying all the rows, you would say "summarize population via sum, and group by country" then you end up with one row displayed per country, and a derived column which is the sum of population. We could include more than one column to be aggregated (i.e. add GDP) and other aggregation functions (e.g. mean, median etc) and even complex totally-derived "measure" columns that don't map 1-1 with input columns, and which are the result of applying functions on multiple existing columns (e.g. sum(col A)/sum(col B)). The idea of a "summary column" to do row-wise aggregation fits into this also: you might easily imagine wanting to have one column on the far right that represents the sum of all the other columns, if all the other columns are sums of their rows.
Once a table knows how to do the above, you could allow it to "drill down" via a "drill-down path" (aka hierarchy or tree-like view) like [country, city] such that clicking on a country would cause per-city rows to appear underneath and slightly indented. The drill-down path could be deeper than 2 levels, of course. Given the ability to drill down, we can also imagine the ability to show expanded groups on initial render, and allow the ability to "roll up" on demand.
This implicitly defines "row groups" i.e. all the cities are grouped by country. When a group is expanded, we could bring in the notion of a "subtotal row" which would do the same thing as a total row/footer on a per-group basis. The notion of a table-wide footer is a specialization of this, essentially (see #433, #627).
For completeness, the same kind of thing might be wanted column-wise i.e. column-groups, although in my experience this is rarer.
So the API here basically looks something like:
How would this intersect with:
keywords for search engines and searching in other places like the forum: hierarchy, tree, groups, drill-down, roll-up, aggregate, total, subtotal, grand total, expand/collapse, details/summary