r-lib / rlang

Low-level API for programming with R
https://rlang.r-lib.org
Other
508 stars 138 forks source link

tidyeval broken in dplyr::case_when/mutate #227

Closed ClaytonJY closed 6 years ago

ClaytonJY commented 7 years ago

My apologies if this is covered in another bug; couldn't find any obvious matches in the existing issues.

I discovered a useful code-pattern using dplyr::case_when inside dplyr::mutate that works fine in rlang 0.1.1, but is broken in rlang 0.1.1.9000:

Expected Behaviour (CRAN, 0.1.1)

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

tbl <- tibble(x = 1:15)

tbl
#> # A tibble: 15 x 1
#>        x
#>    <int>
#>  1     1
#>  2     2
#>  3     3
#>  4     4
#>  5     5
#>  6     6
#>  7     7
#>  8     8
#>  9     9
#> 10    10
#> 11    11
#> 12    12
#> 13    13
#> 14    14
#> 15    15

f <- function(tbl, var) {

  var <- enquo(var)

  tbl %>%
    mutate(new = case_when(
      (!!var) %% 15 == 0 ~ "fizzbuzz",
      (!!var) %% 3  == 0 ~ "fizz",
      (!!var) %% 5  == 0 ~ "buzz",
      TRUE               ~ as.character(!!var)
    ))
}

f(tbl, x)
#> # A tibble: 15 x 2
#>        x      new
#>    <int>    <chr>
#>  1     1        1
#>  2     2        2
#>  3     3     fizz
#>  4     4        4
#>  5     5     buzz
#>  6     6     fizz
#>  7     7        7
#>  8     8        8
#>  9     9     fizz
#> 10    10     buzz
#> 11    11       11
#> 12    12     fizz
#> 13    13       13
#> 14    14       14
#> 15    15 fizzbuzz

devtools::session_info("dplyr")
#> Session info --------------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.4.1 (2017-06-30)
#>  system   x86_64, linux-gnu           
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_US.UTF-8                 
#>  tz       America/New_York            
#>  date     2017-08-03
#> Packages ------------------------------------------------------------------
#>  package    * version    date       source        
#>  assertthat   0.2.0      2017-04-11 CRAN (R 3.4.1)
#>  BH           1.62.0-1   2016-11-19 CRAN (R 3.4.0)
#>  bindr        0.1        2016-11-13 cran (@0.1)   
#>  bindrcpp   * 0.2        2017-06-17 CRAN (R 3.4.0)
#>  dplyr      * 0.7.2      2017-07-20 CRAN (R 3.4.1)
#>  glue         1.1.1      2017-06-21 CRAN (R 3.4.0)
#>  magrittr     1.5        2014-11-22 CRAN (R 3.4.1)
#>  pkgconfig    2.0.1      2017-03-21 cran (@2.0.1) 
#>  plogr        0.1-1      2016-09-24 cran (@0.1-1) 
#>  R6           2.2.2      2017-06-17 cran (@2.2.2) 
#>  Rcpp         0.12.12    2017-07-15 CRAN (R 3.4.1)
#>  rlang        0.1.1      2017-05-18 CRAN (R 3.4.1)
#>  tibble       1.3.3.9001 2017-08-03 local

Actual Behavior (Github master, 0.1.1.9000)

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

tbl <- tibble(x = 1:15)

tbl
#> # A tibble: 15 x 1
#>        x
#>    <int>
#>  1     1
#>  2     2
#>  3     3
#>  4     4
#>  5     5
#>  6     6
#>  7     7
#>  8     8
#>  9     9
#> 10    10
#> 11    11
#> 12    12
#> 13    13
#> 14    14
#> 15    15

f <- function(tbl, var) {

  var <- enquo(var)

  tbl %>%
    mutate(new = case_when(
      (!!var) %% 15 == 0 ~ "fizzbuzz",
      (!!var) %% 3  == 0 ~ "fizz",
      (!!var) %% 5  == 0 ~ "buzz",
      TRUE               ~ as.character(!!var)
    ))
}

f(tbl, x)
#> Error in mutate_impl(.data, dots): Evaluation error: non-numeric argument to binary operator.

devtools::session_info("dplyr")
#> Session info --------------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.4.1 (2017-06-30)
#>  system   x86_64, linux-gnu           
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_US.UTF-8                 
#>  tz       America/New_York            
#>  date     2017-08-03
#> Packages ------------------------------------------------------------------
#>  package    * version    date       source                          
#>  assertthat   0.2.0      2017-04-11 CRAN (R 3.4.1)                  
#>  BH           1.62.0-1   2016-11-19 CRAN (R 3.4.0)                  
#>  bindr        0.1        2016-11-13 cran (@0.1)                     
#>  bindrcpp   * 0.2        2017-06-17 CRAN (R 3.4.0)                  
#>  dplyr      * 0.7.2      2017-07-20 CRAN (R 3.4.1)                  
#>  glue         1.1.1      2017-06-21 CRAN (R 3.4.0)                  
#>  magrittr     1.5        2014-11-22 CRAN (R 3.4.1)                  
#>  pkgconfig    2.0.1      2017-03-21 cran (@2.0.1)                   
#>  plogr        0.1-1      2016-09-24 cran (@0.1-1)                   
#>  R6           2.2.2      2017-06-17 cran (@2.2.2)                   
#>  Rcpp         0.12.12    2017-07-15 CRAN (R 3.4.1)                  
#>  rlang        0.1.1.9000 2017-08-03 Github (tidyverse/rlang@c07ca73)
#>  tibble       1.3.3.9001 2017-08-03 local

traceback

Couldn't get the whole thing to copy-paste well, but this is the last bit, with parent call mutate_impl(.data, dots):

stop(structure(list(message = "Evaluation error: non-numeric argument to binary operator.", 
    call = mutate_impl(.data, dots), cppstack = NULL), .Names = c("message", 
"call", "cppstack"), class = c("Rcpp::eval_error", "C++Error", 
"error", "condition")))

I suppose that suggests this is a C++ internal thing?

I'd be happy to follow-up, try new things, or even write tests, but the C++ internal stuff is a bit beyond me to help fix I think.

Thanks to all for excellent work making it easier to work programatically in the tidyverse; it's a little weird at first, but I'm hooked.

maurolepore commented 7 years ago

👍

lionel- commented 6 years ago

I'm not sure why it works on the CRAN version, but the current behaviour seems consistent. You can use enexpr() rather than enquo() to work around this. case_when() doesn't currently support quosures.

The interaction between case_when() and tidy eval functions is very tricky. An overscope is lexically scoped but here we need some dynamic behaviour in order to reuse the existing overscope set up by mutate().

I think what we need is a _tidyeval_overscope flag in the overscope (like the one in the non-formula quosures branch). Then when case_when() is evaluated in an overscope it can reuse it for its internal evaluation. I think this simple change should solve all evaluation problems with case_when().

lionel- commented 6 years ago

I'm sorry, case_when() vs tidy eval confuses me to no end. It should work and I now remember why ;)

lionel- commented 6 years ago

Reprex:

quo <- quo(~!!quo(x))
is_quosure(f_rhs(get_expr(quo)))
#> [1] TRUE

f <- eval_tidy(quo)
is_quosure(f_rhs(f))
#> [1] FALSE
lionel- commented 6 years ago

Finally fixed. Thanks for the report!