jthomasmock / gtExtras

A Collection of Helper Functions for the gt Package.
https://jthomasmock.github.io/gtExtras/
Other
195 stars 27 forks source link

gt_plt_dist: different plot types on different rows? #102

Closed DanielEWeeks closed 1 year ago

DanielEWeeks commented 1 year ago

Prework

Question

Currently gt_plt_dist uses the same single type for all rows in the table.

However, our tables often have a mixture of continuous and categorical variables, so could gt_plt_dist be extended to accept a vector type with one entry for each row of the table, thereby allowing different plot types to be generated for different rows?

jthomasmock commented 1 year ago

Howdy @DanielEWeeks - thanks for the feature request. Let me ponder on this one - it's a fairly big change to the interface/internals, as we'd now need a 2nd column that represents the type.

jthomasmock commented 1 year ago

Theoretically I have this working...

image

But I'm not sure if there's some interaction with bw that needs to be respected. Any strong opinions/concerns?

In this case, same_limit = FALSE is maybe the best answer... Do you have thoughts/opinions on that?

library(dplyr)
library(gt)
library(gtExtras)

df <- tibble(
  trait = c("age", "weight", "time"),
  Distribution = c(list(
    rnorm(1000),
    sample(1:10, 1000, replace = TRUE),
    runif(1000, min = 5, max = 10)
  ))
) |>
  mutate(type2 = c("density", "histogram", "density"))

df

# This stops with an error
df %>%
  gt() %>%
  gt_plt_dist(
    Distribution,
    same_limit = TRUE,
    type_col = type2,
    fig_dim = c(5, 30)
  )
DanielEWeeks commented 1 year ago

But it looks like your default computation of bw as

bw <- stats::bw.nrd0(stats::na.omit(as.vector(data_in)))

would not be altered by changing the same_limit setting.

But, when drawing multiple histograms, it would be possible then that one might want to use different bin widths for different histograms instead of a common bw value across all histograms.

To support that, instead of a vector of graph types, one would instead need to accept a list (or data frame) of graph parameters, so each spark plot could have individually customized parameters if desired.

I guess which parameters go in the list would depend on how much individual-specific spark plot customization would likely to be useful or desired.

I would think most of these one would typically be applied in common across all the spark plots, but type and bw could vary (although if one wanted to make a particular spark plot stand out, you might want to be able to draw it in a different color).

  type = "density",
  fig_dim = c(5, 30),
  line_color = "black",
  fill_color = "grey",
  bw = NULL,
  trim = FALSE,

Having a list or data frame of graph parameters sounds a bit hard to support, but maybe not if there's a function to populate a default parameter list based on a vector of desired types.