microsoft / datamations

https://microsoft.github.io/datamations/
Other
68 stars 15 forks source link

Allow pipeline without quoting #42

Open sharlagelfand opened 3 years ago

sharlagelfand commented 3 years ago

Would be very slick to allow the pipeline without needing to pass it as a character vector, e.g.

small_salary_data %>%
  group_by(Degree) %>%
  summarize(mean = mean(Salary)) %>%
  datamation_sanddance()

instead of

"small_salary_data %>% group_by(Degree) %>% summarize(mean = mean(Salary))" %>%
  datamation_sanddance()

Have been looking into this a bit but writing it down here to track!

sharlagelfand commented 3 years ago

I've been looking into this and it seems pretty easy when the pipeline doesn't contain ggplot2 code, but much more difficult (not possible?) when it doesn't. Some code / notes:

library(dplyr)
library(ggplot2)

#' You can get the code piped using sys.calls()
piped_code_from_call <- function(x, additional_arg = TRUE) {
  sys.calls()[[1]]
}

mtcars %>%
  count(cyl) %>%
  piped_code_from_call()

# mtcars %>% count(cyl) %>% piped_code_from_call()

#' And then e.g. convert to character and remove the %>% piped_code_from_call() from it

piped_code_from_call <- function(x, additional_arg = TRUE) {
  code <- deparse(sys.calls()[[1]])
  stringr::str_remove(code, " %>% piped_code_from_call\\(\\)")
}

mtcars %>%
  count(cyl) %>%
  piped_code_from_call()

# [1] "mtcars %>% count(cyl)"

#' But unfortunately this doesn't work withg ggplot2 code:

mtcars %>%
  count(cyl) %>%
  ggplot() +
  geom_point(aes(x = cyl)) %>%
  piped_code_from_call()

# Error: Can't add `geom_point(aes(x = cyl)) %>% piped_code_from_call()` to a ggplot object.
# Run `rlang::last_error()` to see where the error occurred.

#' It does to some extent if you wrap the entire pipeline / ggplot2 code in parentheses
(
  mtcars %>%
    count(cyl) %>%
    ggplot() +
    geom_point(aes(x = cyl))
) %>%
  piped_code_from_call()

# [1] "(mtcars %>% count(cyl) %>% ggplot() + geom_point(aes(x = cyl))) %>% "
# [2] "    piped_code_from_call()"     

#' But I'm not sure if different methods (i.e. no parentheses if no ggplot2 code, yes parentheses if ggplot2 code) is confusing / less desirable than just wrapping it in quotes
#' Ideally we could just pipe in, but I don't think that's possible...

I also have a feeling that this sys.calls() approach only really works interactively, so e.g. not in RMarkdown - I tried to render a reprex with the above code (so I wouldn't have to copy the output in myself), and this is how it looks - expand for the full yuck!

``` r library(dplyr) #> #> Attaching package: 'dplyr' #> The following objects are masked from 'package:stats': #> #> filter, lag #> The following objects are masked from 'package:base': #> #> intersect, setdiff, setequal, union library(ggplot2) ``` You can get the code piped using sys.calls() ``` r piped_code_from_call <- function(x, additional_arg = TRUE) { sys.calls()[[1]] } mtcars %>% count(cyl) %>% piped_code_from_call() #> tryCatch(withCallingHandlers({ #> NULL #> saveRDS(do.call(do.call, c(readRDS("/var/folders/j5/d1hztrys1xzg_8557yh176300000gn/T//RtmplEEmi6/callr-fun-1586375dbfa00"), #> list(envir = .GlobalEnv, quote = TRUE)), envir = .GlobalEnv, #> quote = TRUE), file = "/var/folders/j5/d1hztrys1xzg_8557yh176300000gn/T//RtmplEEmi6/callr-res-158633cfa5273") #> flush(stdout()) #> flush(stderr()) #> NULL #> invisible() #> }, error = function(e) { #> { #> callr_data <- as.environment("tools:callr")$`__callr_data__` #> err <- callr_data$err #> capture.output(assign(".Traceback", traceback(9), envir = baseenv())) #> dump.frames("__callr_dump__") #> assign(".Last.dump", .GlobalEnv$`__callr_dump__`, envir = callr_data) #> rm("__callr_dump__", envir = .GlobalEnv) #> e2 <- err$new_error(conditionMessage(e), call. = conditionCall(e)) #> class(e2) <- c("callr_remote_error", class(e2)) #> e2$error <- e #> calls <- sys.calls() #> dcframe <- which(vapply(calls, function(x) length(x) >= #> 1 && identical(x[[1]], quote(do.call)), logical(1)))[1] #> if (!is.na(dcframe)) #> e2$`_ignore` <- list(c(1, dcframe + 1L)) #> e2$`_pid` <- Sys.getpid() #> e2$`_timestamp` <- Sys.time() #> if (inherits(e, "rlib_error")) #> e2$parent <- e$parent #> e2 <- err$add_trace_back(e2) #> saveRDS(list("error", e2), file = paste0("/var/folders/j5/d1hztrys1xzg_8557yh176300000gn/T//RtmplEEmi6/callr-res-158633cfa5273", #> ".error")) #> } #> }, interrupt = function(e) { #> { #> callr_data <- as.environment("tools:callr")$`__callr_data__` #> err <- callr_data$err #> capture.output(assign(".Traceback", traceback(9), envir = baseenv())) #> dump.frames("__callr_dump__") #> assign(".Last.dump", .GlobalEnv$`__callr_dump__`, envir = callr_data) #> rm("__callr_dump__", envir = .GlobalEnv) #> e2 <- err$new_error(conditionMessage(e), call. = conditionCall(e)) #> class(e2) <- c("callr_remote_error", class(e2)) #> e2$error <- e #> calls <- sys.calls() #> dcframe <- which(vapply(calls, function(x) length(x) >= #> 1 && identical(x[[1]], quote(do.call)), logical(1)))[1] #> if (!is.na(dcframe)) #> e2$`_ignore` <- list(c(1, dcframe + 1L)) #> e2$`_pid` <- Sys.getpid() #> e2$`_timestamp` <- Sys.time() #> if (inherits(e, "rlib_error")) #> e2$parent <- e$parent #> e2 <- err$add_trace_back(e2) #> saveRDS(list("error", e2), file = paste0("/var/folders/j5/d1hztrys1xzg_8557yh176300000gn/T//RtmplEEmi6/callr-res-158633cfa5273", #> ".error")) #> } #> }, callr_message = function(e) { #> try(signalCondition(e)) #> }), error = function(e) { #> NULL #> try(stop(e)) #> }, interrupt = function(e) { #> NULL #> e #> }) # mtcars %>% count(cyl) %>% piped_code_from_call() ``` And then e.g. convert to character and remove the %>% piped\_code\_from\_call() from it ``` r piped_code_from_call <- function(x, additional_arg = TRUE) { code <- deparse(sys.calls()[[1]]) stringr::str_remove(code, " %>% piped_code_from_call\\(\\)") } mtcars %>% count(cyl) %>% piped_code_from_call() #> [1] "tryCatch(withCallingHandlers({" #> [2] " NULL" #> [3] " saveRDS(do.call(do.call, c(readRDS(\"/var/folders/j5/d1hztrys1xzg_8557yh176300000gn/T//RtmplEEmi6/callr-fun-1586375dbfa00\"), " #> [4] " list(envir = .GlobalEnv, quote = TRUE)), envir = .GlobalEnv, " #> [5] " quote = TRUE), file = \"/var/folders/j5/d1hztrys1xzg_8557yh176300000gn/T//RtmplEEmi6/callr-res-158633cfa5273\")" #> [6] " flush(stdout())" #> [7] " flush(stderr())" #> [8] " NULL" #> [9] " invisible()" #> [10] "}, error = function(e) {" #> [11] " {" #> [12] " callr_data <- as.environment(\"tools:callr\")$`__callr_data__`" #> [13] " err <- callr_data$err" #> [14] " capture.output(assign(\".Traceback\", traceback(9), envir = baseenv()))" #> [15] " dump.frames(\"__callr_dump__\")" #> [16] " assign(\".Last.dump\", .GlobalEnv$`__callr_dump__`, envir = callr_data)" #> [17] " rm(\"__callr_dump__\", envir = .GlobalEnv)" #> [18] " e2 <- err$new_error(conditionMessage(e), call. = conditionCall(e))" #> [19] " class(e2) <- c(\"callr_remote_error\", class(e2))" #> [20] " e2$error <- e" #> [21] " calls <- sys.calls()" #> [22] " dcframe <- which(vapply(calls, function(x) length(x) >= " #> [23] " 1 && identical(x[[1]], quote(do.call)), logical(1)))[1]" #> [24] " if (!is.na(dcframe)) " #> [25] " e2$`_ignore` <- list(c(1, dcframe + 1L))" #> [26] " e2$`_pid` <- Sys.getpid()" #> [27] " e2$`_timestamp` <- Sys.time()" #> [28] " if (inherits(e, \"rlib_error\")) " #> [29] " e2$parent <- e$parent" #> [30] " e2 <- err$add_trace_back(e2)" #> [31] " saveRDS(list(\"error\", e2), file = paste0(\"/var/folders/j5/d1hztrys1xzg_8557yh176300000gn/T//RtmplEEmi6/callr-res-158633cfa5273\", " #> [32] " \".error\"))" #> [33] " }" #> [34] "}, interrupt = function(e) {" #> [35] " {" #> [36] " callr_data <- as.environment(\"tools:callr\")$`__callr_data__`" #> [37] " err <- callr_data$err" #> [38] " capture.output(assign(\".Traceback\", traceback(9), envir = baseenv()))" #> [39] " dump.frames(\"__callr_dump__\")" #> [40] " assign(\".Last.dump\", .GlobalEnv$`__callr_dump__`, envir = callr_data)" #> [41] " rm(\"__callr_dump__\", envir = .GlobalEnv)" #> [42] " e2 <- err$new_error(conditionMessage(e), call. = conditionCall(e))" #> [43] " class(e2) <- c(\"callr_remote_error\", class(e2))" #> [44] " e2$error <- e" #> [45] " calls <- sys.calls()" #> [46] " dcframe <- which(vapply(calls, function(x) length(x) >= " #> [47] " 1 && identical(x[[1]], quote(do.call)), logical(1)))[1]" #> [48] " if (!is.na(dcframe)) " #> [49] " e2$`_ignore` <- list(c(1, dcframe + 1L))" #> [50] " e2$`_pid` <- Sys.getpid()" #> [51] " e2$`_timestamp` <- Sys.time()" #> [52] " if (inherits(e, \"rlib_error\")) " #> [53] " e2$parent <- e$parent" #> [54] " e2 <- err$add_trace_back(e2)" #> [55] " saveRDS(list(\"error\", e2), file = paste0(\"/var/folders/j5/d1hztrys1xzg_8557yh176300000gn/T//RtmplEEmi6/callr-res-158633cfa5273\", " #> [56] " \".error\"))" #> [57] " }" #> [58] "}, callr_message = function(e) {" #> [59] " try(signalCondition(e))" #> [60] "}), error = function(e) {" #> [61] " NULL" #> [62] " try(stop(e))" #> [63] "}, interrupt = function(e) {" #> [64] " NULL" #> [65] " e" #> [66] "})" # [1] "mtcars %>% count(cyl)" ``` But unfortunately this doesn’t work withg ggplot2 code: ``` r mtcars %>% count(cyl) %>% ggplot() + geom_point(aes(x = cyl)) %>% piped_code_from_call() #> Error: Can't add `geom_point(aes(x = cyl)) %>% piped_code_from_call()` to a ggplot object. # Error: Can't add `geom_point(aes(x = cyl)) %>% piped_code_from_call()` to a ggplot object. # Run `rlang::last_error()` to see where the error occurred. ``` It does to some extent if you wrap the entire pipeline / ggplot2 code in parentheses ``` r ( mtcars %>% count(cyl) %>% ggplot() + geom_point(aes(x = cyl)) ) %>% piped_code_from_call() #> [1] "tryCatch(withCallingHandlers({" #> [2] " NULL" #> [3] " saveRDS(do.call(do.call, c(readRDS(\"/var/folders/j5/d1hztrys1xzg_8557yh176300000gn/T//RtmplEEmi6/callr-fun-1586375dbfa00\"), " #> [4] " list(envir = .GlobalEnv, quote = TRUE)), envir = .GlobalEnv, " #> [5] " quote = TRUE), file = \"/var/folders/j5/d1hztrys1xzg_8557yh176300000gn/T//RtmplEEmi6/callr-res-158633cfa5273\")" #> [6] " flush(stdout())" #> [7] " flush(stderr())" #> [8] " NULL" #> [9] " invisible()" #> [10] "}, error = function(e) {" #> [11] " {" #> [12] " callr_data <- as.environment(\"tools:callr\")$`__callr_data__`" #> [13] " err <- callr_data$err" #> [14] " capture.output(assign(\".Traceback\", traceback(9), envir = baseenv()))" #> [15] " dump.frames(\"__callr_dump__\")" #> [16] " assign(\".Last.dump\", .GlobalEnv$`__callr_dump__`, envir = callr_data)" #> [17] " rm(\"__callr_dump__\", envir = .GlobalEnv)" #> [18] " e2 <- err$new_error(conditionMessage(e), call. = conditionCall(e))" #> [19] " class(e2) <- c(\"callr_remote_error\", class(e2))" #> [20] " e2$error <- e" #> [21] " calls <- sys.calls()" #> [22] " dcframe <- which(vapply(calls, function(x) length(x) >= " #> [23] " 1 && identical(x[[1]], quote(do.call)), logical(1)))[1]" #> [24] " if (!is.na(dcframe)) " #> [25] " e2$`_ignore` <- list(c(1, dcframe + 1L))" #> [26] " e2$`_pid` <- Sys.getpid()" #> [27] " e2$`_timestamp` <- Sys.time()" #> [28] " if (inherits(e, \"rlib_error\")) " #> [29] " e2$parent <- e$parent" #> [30] " e2 <- err$add_trace_back(e2)" #> [31] " saveRDS(list(\"error\", e2), file = paste0(\"/var/folders/j5/d1hztrys1xzg_8557yh176300000gn/T//RtmplEEmi6/callr-res-158633cfa5273\", " #> [32] " \".error\"))" #> [33] " }" #> [34] "}, interrupt = function(e) {" #> [35] " {" #> [36] " callr_data <- as.environment(\"tools:callr\")$`__callr_data__`" #> [37] " err <- callr_data$err" #> [38] " capture.output(assign(\".Traceback\", traceback(9), envir = baseenv()))" #> [39] " dump.frames(\"__callr_dump__\")" #> [40] " assign(\".Last.dump\", .GlobalEnv$`__callr_dump__`, envir = callr_data)" #> [41] " rm(\"__callr_dump__\", envir = .GlobalEnv)" #> [42] " e2 <- err$new_error(conditionMessage(e), call. = conditionCall(e))" #> [43] " class(e2) <- c(\"callr_remote_error\", class(e2))" #> [44] " e2$error <- e" #> [45] " calls <- sys.calls()" #> [46] " dcframe <- which(vapply(calls, function(x) length(x) >= " #> [47] " 1 && identical(x[[1]], quote(do.call)), logical(1)))[1]" #> [48] " if (!is.na(dcframe)) " #> [49] " e2$`_ignore` <- list(c(1, dcframe + 1L))" #> [50] " e2$`_pid` <- Sys.getpid()" #> [51] " e2$`_timestamp` <- Sys.time()" #> [52] " if (inherits(e, \"rlib_error\")) " #> [53] " e2$parent <- e$parent" #> [54] " e2 <- err$add_trace_back(e2)" #> [55] " saveRDS(list(\"error\", e2), file = paste0(\"/var/folders/j5/d1hztrys1xzg_8557yh176300000gn/T//RtmplEEmi6/callr-res-158633cfa5273\", " #> [56] " \".error\"))" #> [57] " }" #> [58] "}, callr_message = function(e) {" #> [59] " try(signalCondition(e))" #> [60] "}), error = function(e) {" #> [61] " NULL" #> [62] " try(stop(e))" #> [63] "}, interrupt = function(e) {" #> [64] " NULL" #> [65] " e" #> [66] "})" # [1] "(mtcars %>% count(cyl) %>% ggplot() + geom_point(aes(x = cyl))) %>% " # [2] " piped_code_from_call()" ```
jhofman commented 3 years ago

@seankross: curious if you know any tricks that might work here, since you did the original pipeline parsing.

as per @sharlagelfand's comments above, seems like doing this with %>% is possible, but (literally) adding in ggplot commands with the + operator causes all kinds of trouble.

thanks for any ideas/tips/pointers!

seankross commented 3 years ago

Hey Jake and Sharla,

I would be happy to help think about this but I am underwater with work this week, but I can revisit this next week. All the work that has been done on this is so exciting!!

Sean

seankross commented 3 years ago

Hi Jake and Sharla,

My instinct for this to took really slick is to redefine the pipe (%>% and/or |>) and to redefine ggplot2's + (implementation here: https://github.dev/tidyverse/ggplot2/blob/master/R/plot-construction.r). Here is a proof of concept:

library(tidyverse)
library(rlang)

`%>%` <- function(lhs, rhs) {
  if(inherits(lhs, "quo_list")) {
    result <- c(lhs, enquo(rhs))
  } else {
    result <- list(enquo(lhs), enquo(rhs))
  }
  class(result) <- "quo_list"
  result
}

1:5 %>% sum()

# [[1]]
# <quosure>
#   expr: ^<int: 1L, 2L, 3L, 4L, 5L>
#   env:  empty
# 
# [[2]]
# <quosure>
#   expr: ^sum()
# env:  global
# 
# attr(,"class")
# [1] "quo_list"

diamonds %>%
  filter(carat > 1) %>% 
  ggplot(aes(carat, price, colour = cut)) +
    geom_point() %>% 
    piped_code_from_call()

# [[1]]
# <quosure>
#   expr: ^<tibble[,10]>
#   env:  empty
# 
# [[2]]
# <quosure>
#   expr: ^filter(carat > 1)
# env:  global
# 
# [[3]]
# <quosure>
#   expr: ^ggplot(aes(carat, price, colour = cut))
# env:  global
# 
# [[4]]
# <quosure>
#   expr: ^geom_point() %>% piped_code_from_call()
# env:  global
# 
# attr(,"class")
# [1] "quo_list"

As you can see this doesn't completely work because + has not been redefined, and that requires some S3 magic that I was hoping you could help figure out 😁. You could probably avoid using rlang too but it made this quick to whip up. Let me know what you think.

Sean

sharlagelfand commented 3 years ago

Awesome, thanks @seankross!! Will take a look over this 🙌🏻