jthomasmock / gtExtras

A Collection of Helper Functions for the gt Package.
https://jthomasmock.github.io/gtExtras/
Other
193 stars 26 forks source link

Not extrapolating missings in `gt_plt_dist` #90

Closed psyguy closed 1 year ago

psyguy commented 1 year ago

Hi there, Thanks a lot for the wonderful package!

I want to add gt_plt_dist to gt table, and my data has quite some missing values, and the function extrapolates NAs in the embedded plot, which is contrary to what I would like to have.

Expcted behavior

Here's an MWE:

library(gt)
library(gtExtras
library(tidyverse)

# Making a dataset
set.seed(2023-05-04)
N <- 100
d <- data.frame(id = rep(c("A", "B", "C"),
                         each = N),
                t = rep(1:N, 3),
                y = rnorm(3*N))

# Adding missing values
d$y[sample(1:N, N/4)] <- NA

# ggplot without extrapolation
d %>%
  ggplot() +
  aes(x = t,
      y = y) +
  geom_line(linewidth = rel(1)) +
  geom_point(size = rel(1.5),
             color = "red") +
  facet_wrap(~id,
             nrow = 3)

Created on 2023-05-04 with reprex v2.0.2

This makes the following plot without NA extrapolation (the thing I am interested in):

image

Current undesired output

On the other hand, with gt_plt_sparkline,

# plotting using gtExtras
d %>%
  dplyr::group_by(id) %>%
  dplyr::summarize(ts = list(y),
                   .groups = "drop") %>%
  gt() %>%
  gt_plt_sparkline(ts,
                   fig_dim = c(20, 100),
                   palette = c("black", rep("transparent", 2), "lightgray"),
                   type = "points")

Created on 2023-05-04 with reprex v2.0.2

I get these sparklines that do not show line breaks (which is in line with #52, I suppose):

image

I tried inspecting the source code of gt_plt_sparkline (ll. 167-172), but could not figure out where the extrapolation is coming from. Could it be fixed?

Thanks.

jthomasmock commented 1 year ago

Appreciate the feedback! I think you're actually asking about gt_plt_sparkline and not gt_plt_dist?

Handling NAs is quite tricky with the plotting functions since I'm essentially just dealing with vectors rather than proper dataframes. In this case, NAs are not extrapolated but rather ignored completely.

# here, NAs are excluded
 vals <- as.double(stats::na.omit(list_data_in))

There's been a lot of edge cases across the plotting functions when accepting NAs but I think I have a fix for this and it should be resolved in gtExtras v0.5:

library(gt)
library(gtExtras)
library(tidyverse)

# Making a dataset
set.seed(2023 - 05 - 04)
N <- 100
d <- data.frame(
  id = rep(
    c("A", "B", "C"),
    each = N
  ),
  t = rep(1:N, 3),
  y = rnorm(3 * N)
)

# Adding missing values
d$y[sample(1:N, N / 4)] <- NA

d %>%
  dplyr::group_by(id) %>%
  dplyr::summarize(
    ts = list(y),
    .groups = "drop"
  ) %>%
  gt() %>%
  gt_plt_sparkline(
    ts,
    fig_dim = c(20, 100),
    palette = c("black", rep("transparent", 3), "lightgray"),
    type = "points"
  ) |>
  gt_reprex_image()

Created on 2023-07-27 by the reprex package (v2.0.1)