jthomasmock / gtExtras

A Collection of Helper Functions for the gt Package.
https://jthomasmock.github.io/gtExtras/
Other
195 stars 27 forks source link

Allow to show all points in `gt_sparkline()` #31

Closed z3tt closed 2 years ago

z3tt commented 2 years ago

Hi Thomas, again many thanks for this neat package, this package was by far the simplest approach to get a sparkline table combination.

When using the gt_sparkline functionality, the logic currently is to plot the extreme values plus the last observation. I get the idea and while it makes sense in case you have many points, especially with a lot of variabilities, there are use cases for which showing all points (or omitting some others) is more advantageous.

I attach a screenshot of such a situation: Here, most values are zero and there are rarely several positive values. This results in a table in which most sparklines show all points but some look weird as there are 1-3 points missing:

image

It would be great if one could optionally control the behaviour, e.g. the current behavior as the default and a setting that allows drawing all points.

jthomasmock commented 2 years ago

Thanks Cedric! I've got a few different sparkline subtypes that I'll be adding, including a geom_point() on every data value. With the screenshots you have included, is that an existing table made with gtExtras? If so, can you share a reprex to recreate so that I create tests and examples against it?

I'm also not following why there would be missing points as seen in row 4? I get that you may have a line that is mostly flat, but I don't understand why there are missing points at all. Thanks!

z3tt commented 2 years ago

Sure Thomas, here is a simplified example, leading to a slightly different table but with the same "issue". The n column shows the count of non-zero values (so as you can see there should be 5 points in that example).

library(dplyr)
library(gt)
library(gtExtras)

data <- 
  structure(list(
    id = c("1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "2", "2", "2", 
           "2", "2", "2", "2", "2", "2", "2", "2", "3", "3", "3", "3", "3", "3", 
           "3", "3", "3", "3", "3", "4", "4", "4", "4", "4", "4", "4", "4", "4", 
           "4", "4", "5", "5", "5", "5", "5", "5", "5", "5", "5", "5", "5"), 
    n = c(0, 0, 0, 0, 0,  0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 
          0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 2, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 
          0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0), 
    year = c(2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 
             2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 
             2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 
             2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 
             2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021)), 
    row.names = c(NA, -55L), class = c("tbl_df", "tbl", "data.frame")
  ) %>% 
  group_by(id) %>% 
  summarize(total = sum(n), timeseries = list(n), .groups = "drop")

data %>% 
  gt() %>%
  ## add timeseries as sparkline
  gt_sparkline(
    timeseries, type = "sparkline", 
    range_colors = c("grey70", "black"), 
    line_color = "grey70",
    label = FALSE
  )

image

Created on 2021-12-17 by the reprex package (v2.0.1)

jthomasmock commented 2 years ago

Gotcha, so while it looks like your points are "missing" in this case, they are not plotted since they are not the "final" point or the max/min points.

The newer version of gt_plt_sparkline(), will have a type = "points" option that plots points for every observation, which I believe is the behavior you want.

See below for the future version:

gt_plt_sparkline

z3tt commented 2 years ago

Exactly Thomas, I was aware that they are not missing from the data set (something I could fix) but that it is the intended behavior that simply doesn't fit my needs here. Thanks for including this option in future versions, much appreciated!

jthomasmock commented 2 years ago

This feature has been added in latest release, notably with the below code and the use of gtExtras::gt_plt_sparkline() which replaces gt_sparkline(). I split the function into two parts so that it has the flexibility needed for the two use-cases (show all the points/lines like a traditional sparkline vs a summary distribution like a histogram, boxplot, etc).

library(dplyr, warn.conflicts = FALSE)
library(gt)
library(gtExtras)

data <- 
  structure(list(
    id = c("1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "2", "2", "2", 
      "2", "2", "2", "2", "2", "2", "2", "2", "3", "3", "3", "3", "3", "3", 
      "3", "3", "3", "3", "3", "4", "4", "4", "4", "4", "4", "4", "4", "4", 
      "4", "4", "5", "5", "5", "5", "5", "5", "5", "5", "5", "5", "5"), 
    n = c(0, 0, 0, 0, 0,  0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 
      0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 2, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 
      0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0), 
    year = c(2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 
      2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 
      2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 
      2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 
      2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021)), 
    row.names = c(NA, -55L), class = c("tbl_df", "tbl", "data.frame")
  ) %>% 
  group_by(id) %>% 
  summarize(total = sum(n), timeseries = list(n), .groups = "drop")

data %>% 
  gt() %>%
  ## add timeseries as sparkline
  gt_plt_sparkline(
    timeseries, type = "points", 
    pal = c("grey70", "grey70", "grey70", "black", "grey70"),
    label = FALSE
  ) %>% 
  gtsave("test.png")

Created on 2021-12-20 by the reprex package (v2.0.1)