New feature: handle superset / factor level palettes gracefully in gt_fa_column

mikedolanfliss commented 2 years ago

Prework

In reference to this prior question about multiple columns, thank you for the workaround. I think my question / "bug" was really a feature request, so here's a simpler feature request that might pass muster as being worth the lift.

I appreciate handling multiple columns at once in gt_fa_column() may be a lot of work - so I get conceding to the workaround you suggested, with multiple gt_fa_column() calls for each column. In the reprex it was 3 columns, but in practice for me that might be a dozen yes/no columns, or columns of other common (equivalent) factor levels even if the columns, on any individual run, vary.

It still seems to me - borrowing some conventions from ggplot - that providing the levels of the factor in value-color pairs should work even if the column (maybe only temporarily) has fewer than the totally allowable levels.

It's a common pattern to share a palette for an element across multiple ggplot graphs by defining that palette ( as you did with ex_pal <- c("square-check" = "green", "square-xmark" = "red") ), even if sometimes the individual graph doesn't have the full list of factor values (given subsetting, whatever).

Proposal

Looking at the implementation of gt_fa_column: to me, the factor levels aren't really the unique() values of x, but the levels() of x, if it were a factor or factorizable (character). So I thought about tweaks to gt_fa_column to do a factor comparison, something like all(levels(pal_filler) %in% levels(x)) as an alternative to your stopifnot...

fct_lvl <- unique(x)
stopifnot("The length of the unique elements must match the palette length" = length(fct_lvl) == length(pal_filler))

...But I think the simplest and safest addition to make the function more robust would be to simple require the unique values of x to all be represented in the palette, even if the palette has values not currently seen in the data. Then reduce the palette to the same number of values - which would then pass the stopifnot naturally. That would, I think, require changing only 2 lines: add an else if statement that massages the palette into shape. Something like the new second else if below:

if (is.null(palette)) {
  pal_filler <- c(
    "#000000", "#E69F00", "#56B4E9", "#009E73",
    "#F0E442", "#0072B2", "#D55E00", "#CC79A7"
  )[seq_along(unique(x[!(x %in% c("", "NA", NA))]))]
} else if (length(palette) == 1) {
  pal_filler <- palette %>% rep(length(unique(x)))
} else if (all(unique(x) %in% names(palette))) { 
# palette is superset of values, so reduce palette to just what's needed
  pal_filler = palette[unique(x)]
} else {
  pal_filler <- palette
}

Thoughts? I'm trying to think of palette conventions in ggplot and have some recollection of this sometimes working. Could be wishful thinking though :)

jthomasmock commented 2 years ago

Merged this into main after passing checks - thank you for the scoped feature request!

It also solves your specific issue related to col1:col3:

library(gt)
library(gtExtras)
library(tidyverse)

test_tbl = tibble(col1 = rep(T, 3), col2 = c(T, T, F), col3 = rep(F, 3))

ex_pal <- c("square-check" = "green", "square-xmark" = "red")

test_tab <- test_tbl %>% 
  mutate(across(where(is_logical),~ case_when(.x ~ "square-check", !.x ~ "square-xmark"))) %>% 
  gt()

test_tab %>%
  gt_fa_column(column = col1:col3, palette = ex_pal)

^{Created on 2022-10-02 by the reprex package (v2.0.1)}

mikedolanfliss commented 2 years ago

Slick, thanks! Makes for a much tighter and elegant gt() call in the end.

Your package is stellar. I'm using it at the moment for some public health quality check reports (hence the "what tests did each dataset pass?" use case). In practices I'll be doing things like ...

test_tab %>% gt_fa_column(column = matches("^qc_lgl_"), palette = lgl_pal)

... to color all the logical columns at once.

Thanks for your help and generocity!

jthomasmock commented 2 years ago

Nice! I'd also suggest checking out pointblank - by Rich Iannone (same author as gt). He's built some gt reporting into the overall data validation strategy:

https://rich-iannone.github.io/pointblank/

Example of one of the validation tables.

mikedolanfliss commented 2 years ago

Very interesting, thanks! I've been following the pattern of generating a different tibble() that represents the results of a set of tests... but will check out this package. Like the abstracted "agent" concept.

Again, appreciate the help!!

jthomasmock / gtExtras

New feature: handle superset / factor level palettes gracefully in gt_fa_column #70

Prework

Proposal