jthomasmock / gtExtras

A Collection of Helper Functions for the gt Package.
https://jthomasmock.github.io/gtExtras/
Other
193 stars 26 forks source link

Allow use of a 3rd column to color the column in the `gt_merge_stack_color()` function #83

Closed Josephhero closed 11 months ago

Josephhero commented 1 year ago

Prework

No duplicates, though this issue touched on it: https://github.com/jthomasmock/gtExtras/issues/71

Proposal

When using the gt_merge_stack_color() function, allow the cells to be colored by a separate third column, rather than by one of the merged columns. Ideally, using the reprex provided, I would like to merge the top_stack and bottom_stack columns, which this function allows, but I would like to be able to color it based on the values in the percentiles column. So instead of having just a top_val and color_val, have a top_val, a bottom_val, and a color_val.

reprex:

library(gt)
library(gtExtras)

tbl_data <- tibble::tibble(
  top_stack = sample(0:50, 5, TRUE),
  bottom_stack = sample(0:10, 5, TRUE),
  percentiles = sample(0:100, 5, TRUE)
)

table <- 
  gt(tbl_data) |> 
  gt_merge_stack_color(top_val = top_stack, 
                       color_val = bottom_stack)
#> Warning: Domain not specified, defaulting to observed range within each
#> specified column.

Created on 2023-02-21 with reprex v2.0.2

My proposal would end up looking something like this:

table <- 
  gt(tbl_data) |> 
  gt_merge_stack_color(top_val = top_stack, 
                       bottom_val = bottom_stack)
                       color_val = percentiles)
jthomasmock commented 1 year ago

Howdy @Josephhero thanks for the FR!

I struggle to move forward with these 3 column combinations, as we can typically no longer match them to the original columns cleanly. What I mean is that I am not following the semantic purpose of the color. The background in the current function applies to the top colorizing value (ie it's dual encoded as a value and a color).

I get wanting to color the top and bottom according to say a category or their value, but then changing the background of the cell in conjunction with that from a separate column feels a bit off as we are combining too many things.

Can you help me understand a real life example?

Josephhero commented 1 year ago

Hi Thomas! Yes, I should have provided a more clear example. What I am trying to do is calculate the percentile for an entire dataset, but I am only plotting one or two categories out of that dataset. I want the colors to reflect the percentile using the entire dataset, not just the range in the two categories. Here's a reprex:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(gt)
library(gtExtras)

tbl_data <- tibble::tibble(
  categories = c("a", "a", "b", "b", "c", "c", "d", "d", "e", "e"), 
  top_stack = sample(0:50, 10, TRUE),
  bottom_stack = sample(0:10, 10, TRUE),
  percentiles = sample(0:100, 10, TRUE)
)

table <- 
  gt(filter(tbl_data, categories %in% c("a", "b"))) |> 
  gt_merge_stack_color(top_val = top_stack, 
                       color_val = bottom_stack)
#> Warning: Domain not specified, defaulting to observed range within each
#> specified column.

Created on 2023-03-17 with reprex v2.0.2

The real life example I am working with is NFL data, where I want to stack two data items (say, epa/play on top and number of plays under it), but I want the cell color to indicate where the epa/play ranks among all NFL teams, even though I'm only plotting one team.

If it's difficult adding a third column as the color column, would it be possible instead to allow the user to choose which of the columns (top_val or bottom_val) to use as the color value, rather than forcing it to be the bottom value? That would at least allow me to set a domain of domain = min(tbl_data$top_stack):max(tbl_data$top_stack)) which would usually be enough to solve this issue.

jthomasmock commented 1 year ago

Hi Joseph - sorry to leave this one hanging 😢

I think I understand the interest, but I'm not quite there on the 3rd column should be coloring a column with two numerics within it and the potential for edge cases that arise.

jthomasmock commented 11 months ago

Closing as not planned.