tidyverse / dplyr

dplyr: A grammar of data manipulation
https://dplyr.tidyverse.org/
Other
4.77k stars 2.12k forks source link

Unexpected behavior in filter when not unquoting the quosure #3476

Closed ColinFay closed 6 years ago

ColinFay commented 6 years ago

Context

I was writing a blogpost about tidyeval, with as an example a very simple function based on filter.

To stress the "two steps process" (enquo then !!), I wanted to build a function without !! to show the returned error.

Issue

It seems that filter doesn't return an error when we "forget" to unquote the quosure.

#Doesn't throw an error (as could be expected) 
custom_filter <- function(df, col, threshold){
  col <- enquo(col)
  filter(df, col > threshold)
}

custom_filter(iris, Sepal.Length, 2)
[1] Sepal.Length Sepal.Width  Petal.Length Petal.Width  Species     
<0 lignes> (ou 'row.names' de longueur nulle)

And

# Throws an error (as expected) : 
custom_select <- function(df, col){
  col <- enquo(col)
  select(df, col)
}
custom_select(iris, Species)
Erreur : `col` must resolve to integer column positions, not formula 

After a quick private conversation with @romainfrancois on Slack, he pointed out that it seems to be due to the behavior of formulas:

> col <- quo(col)
> col < 2
[1] TRUE
> col > 2
[1] FALSE
> col <- quote(~col)
> col < 2
[1] TRUE
> col > 2
[1] FALSE

It seems that formulas and quosures can be evaluated against a number, leading to an unexpected behavior in filter when we forget to !! a quosure.

This might be more an rlang issue, so just let me know and I'll move this issue on the rlang repo if needed.

romainfrancois commented 6 years ago

Yeah so as far as filter is concerned, it looks like you either do filter(iris, FALSE) or filter(iris, TRUE).

> ir <- as_tibble(iris)
> filter(ir, TRUE)
# A tibble: 150 x 5
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
          <dbl>       <dbl>        <dbl>       <dbl> <fct>  
 1         5.10        3.50         1.40       0.200 setosa 
 2         4.90        3.00         1.40       0.200 setosa 
 3         4.70        3.20         1.30       0.200 setosa 
 4         4.60        3.10         1.50       0.200 setosa 
 5         5.00        3.60         1.40       0.200 setosa 
 6         5.40        3.90         1.70       0.400 setosa 
 7         4.60        3.40         1.40       0.300 setosa 
 8         5.00        3.40         1.50       0.200 setosa 
 9         4.40        2.90         1.40       0.200 setosa 
10         4.90        3.10         1.50       0.100 setosa 
# ... with 140 more rows
> filter(ir, quo(col) < 2 )
# A tibble: 150 x 5
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
          <dbl>       <dbl>        <dbl>       <dbl> <fct>  
 1         5.10        3.50         1.40       0.200 setosa 
 2         4.90        3.00         1.40       0.200 setosa 
 3         4.70        3.20         1.30       0.200 setosa 
 4         4.60        3.10         1.50       0.200 setosa 
 5         5.00        3.60         1.40       0.200 setosa 
 6         5.40        3.90         1.70       0.400 setosa 
 7         4.60        3.40         1.40       0.300 setosa 
 8         5.00        3.40         1.50       0.200 setosa 
 9         4.40        2.90         1.40       0.200 setosa 
10         4.90        3.10         1.50       0.100 setosa 
# ... with 140 more rows
> 
> filter(ir, FALSE)
# A tibble: 0 x 5
# ... with 5 variables: Sepal.Length <dbl>, Sepal.Width <dbl>, Petal.Length <dbl>, Petal.Width <dbl>, Species <fct>
> filter(ir, quo(col) > 2 )
# A tibble: 0 x 5
# ... with 5 variables: Sepal.Length <dbl>, Sepal.Width <dbl>, Petal.Length <dbl>, Petal.Width <dbl>, Species <fct>

So not sure what we can do at the dplyr level.

Perhaps rlang could be defensive @lionel-, but I guess with a better message 🙈

`<.quosure` <- function(x, y){ stop( emo::ji("scream") ) }
> filter(ir, quo(col) < 2 )
 Show Traceback

 Rerun with Debug
 Error in filter_impl(.data, quo) : Evaluation error: 😱. 

Might be worth implementing all the Ops methods ?

lionel- commented 6 years ago

yup this is standard R (nonsensical) behaviour where expressions are comparable:

2 > ~foo
#> [1] TRUE

Might be worth implementing all the Ops methods ?

This sounds like a good idea!

krlmlr commented 6 years ago

Thanks for looking into this!

lock[bot] commented 6 years ago

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/