General spec_plotfun function

haozhu233 commented 4 years ago

@r2evans made a great suggestion that we should create a general spec_plot function that allows users to feed in customized plotting functions. Here is the issue to track the progress and manage the discussions on this topic.

r2evans commented 4 years ago

Sorry, I probably made the PR a bit more detailed than it needed to be. I'll copy the examples from there so that we can discuss it here.

Using the spec_plot as defined in #505 and this code:

mpg_list <- split(mtcars$mpg, mtcars$cyl)
inline_plot <- data.frame(cyl = c(4, 6, 8), mpg_box = "", mpg_hist = "",
                          spec_line = "", small_lines = "", small_lines_gg = "")
inline_plot %>% 
  kbl(booktabs = T) %>%
  kable_paper(full_width = F) %>%
  column_spec(2, image = spec_boxplot(mpg_list)) %>%
  column_spec(3, image = spec_hist(mpg_list)) %>%
  column_spec(4, image = spec_line(mpg_list, same_lim = FALSE)) %>%
  column_spec(5, image = spec_plot(small_lines(), mpg_list)) %>%
  column_spec(6, image = spec_plot(small_lines_gg(), mpg_list))

we can generate

and take it to a bit more complicated with:

set.seed(42)
x <- cumsum(rnorm(20))

somefunc <- function(z) {
  dat <- data.frame(
    x = seq_along(z),
    y = z,
    mu = zoo::rollapply(z, 9, mean, partial = TRUE),
    sigma = 12*zoo::rollapply(z, 9, sd, partial = TRUE)
  )
  graphics::par(mar = c(0, 0, 0.2, 0), lwd=0.5)
  plot(NA, type = "n", xaxt = "n", yaxt = "n", ann = FALSE, frame.plot = FALSE,
       xlim = c(1, length(z)), ylim = range(z) + c(-1,1)*max(dat$sigma))
  polygon(c(dat$x, rev(dat$x), dat$x[1]),
          c(dat$mu + dat$sigma, rev(dat$mu - dat$sigma), dat$mu[1] + dat$sigma[1]),
          col = "gray80", lwd = 0.1)
  lines(mu~x, data=dat, col = "red", lwd = 0.2)
}

data.frame(a="quux", b="") %>%
  kbl(booktabs = T) %>%
  kable_paper(full_width = F) %>%
  column_spec(2, image = spec_plot(somefunc, list(x)))

r2evans commented 4 years ago

Some concerns I have, looking for your comments:

the technique that allows spec_plot to deal with both base graphics and ggplot2 graphics is that if there's a return object from the UDF that inherits "ggplot", then we discard the already-attempted grDevices::svg-saved image (or png), because ggplot2 performs inconsistently (at least with svg). To get it to work, I get the ggplot2::ggsave function with
```
ggsave <- get("ggsave", envir=as.environment("package:ggplot2"))
```
so that we don't necessarily trigger a CRAN-complaint about not Importing or Depends on ggplot2. The reason I don't go so far as to requireNamespace("ggplot2") is that if we are collecting a graphic of class "ggplot", I find it highly unlikely that we need to try hard to load the namespace and fail if not found. While I can contrive of an example where a system returns a class "ggplot"-object without have ggplot2-package available, it is ... obscure.
I added ggplot2 to Suggests:, since this package will be able to take advantage of it if present. Your other option is to add ggplot2 to Imports: and we can access ggsave with the standard double-colon notation without CRAN complaining ... but I was not assuming that you wanted to add that otherwise-optional package as an installation prerequisite.
I knowingly run the inefficiency of calling svg(); ...; dev.off() and then overwriting with ggsave, since we don't know until the plot function is complete whether we can infer it is ggplot-like. Since this is not likely to surprise the user, we could certainly require spec_plot(func, mpg_list, isggplot=TRUE) or similar, and remove the double-tap on ::svg(). I don't think it's a problem, frankly, other than a little extra processing.
I think I am handling ggsave calls correctly. I am not proficient at all of the units= and such that it infers/requires, so I found a combination that works. Oddly, I found I must use device=grDevices::svg but cannot use device=grDevices::png, failing for width/height problems. There might be a better way to handle this in the if (inherits(thisplot, "ggplot")) block.
I added the use of on.exit(dev.off(curdev)) (also in spec_line) because during testing, I often interrupted things and had a stale graphics device sitting around. Calling dev.off(curdev) multiple times (on the same dev) is harmless and repeats are a no-op. For consistency, it might be good to add this to spec_boxplot and spec_hist as well.

And some discussion points on a separate vignette on UDFs:

"Static" functions are pretty straight-forward, as in the somefunc above;
"Flexible" functions are really just functions that return functions ... as in my small_lines example. It allows one to have a straight-forward UDF that is extensible. small_lines() returns a function (with formals x,y), and it utilizes all of the defaults. One can override certain defaults, ala
```
... %>%
  column_spec(2, image = spec_plot(small_lines(              ), mpg_list)) %>%
  column_spec(3, image = spec_plot(small_lines(col = "green4"), mpg_list))
```
besides example functions in a vignette (I think a separate one might be in order), do you think we should add arguments to spec_plot that starts with a compliant plot(...) (margins set, etc) and allows the UDF to just add lines, polygons, etc? Similarly, ggplot-analog with theme already appended? I fear over-complexity of spec_plot and generally recommend against it, especially since I believe that your spec_{hist,boxplot,line} functions are really meant to be straight-forward/simple and safe.
another alternative I considered (and discarded temporarily for simplicity/time) was using expressions instead of functions, perhaps similar to rlang, perhaps just unevaluated (substituted) calls with placeholders for the actual values. I don't know that this adds any capability over the "simple" and "flexible" functions above, and it does add a little complexity ... I'd think the only would be that it would change from these
```
... %>%
  column_spec(2, image = spec_plot(somefunc, mpg_list)) %>%
  column_spec(3, image = spec_plot(function(x) somefunc(x, col="green4"), mpg_list))
```
to these
```
... %>%
  column_spec(2, image = spec_plot(somefunc(.x), mpg_list)) %>%
  column_spec(3, image = spec_plot(somefunc(.x, col="green4"), mpg_list))
```
(Perhaps a little more readable.) The risk is that if you have plots requiring multiple list arguments, it then because purrr-like, ala
```
... %>%
  column_spec(2, image = spec_plot(somefunc(.x, .y, .z), list1, list2, list3)) %>%
  column_spec(3, image = spec_plot(somefunc(..1, ..2, ..3, col="green4"), list1, list2, list3))
```

harrelfe commented 1 year ago

I tried running the earlier somefunc code (but using pipe |>) and got Error in min(x, na.rm = na.rm) : invalid 'type' (list) of argument. Is this general "somefunc" approach implemented?

r2evans commented 1 year ago

@harrelfe I don't think so, unfortunately. I think all (most, at least) of my use-cases are currently covered by the other spec_* functions, so I slowed development on it and moved on. What's your use-case where you'd prefer to use a UDF for plotting?

haozhu233 / kableExtra

General spec_plotfun function #503