tidyverse / dplyr

dplyr: A grammar of data manipulation
https://dplyr.tidyverse.org/
Other
4.79k stars 2.12k forks source link

filter should warn or prevent users from using named logical inputs #7105

Open conig opened 1 week ago

conig commented 1 week ago

Currently dplyr warns users who accidentally use = instead of ==. However, this does not occur if a logical is passed as the named variable.

Demonstration

library(dplyr)

mtcars$big_cyl <- mtcars$cyl > 4
# Mistaking = for == silently fails, returning the whole dataset
filter(mtcars, big_cyl = TRUE) |>
  nrow()
#> [1] 32

Correctly using == for comparison

# Correctly using ==.
filter(mtcars, big_cyl == TRUE) |>
  nrow()
#> [1] 21

I think while doing x == TRUE is bad practice, this is bound to trip up some users and an error should be thrown.

Additional context

Example of the error working correctly:

# version 1.1.4
dplyr::filter(mtcars, cyl = "4")
#> Error in `dplyr::filter()`:
#> ! We detected a named input.
#> ℹ This usually means that you've used `=` instead of `==`.
#> ℹ Did you mean `cyl == "4"`?

Interestingly if the TRUE is in a vector the error is thrown.

dplyr::filter(mtcars, big_cyl = c(TRUE))
#> Error in `dplyr::filter()`:
#> ! We detected a named input.
#> ℹ This usually means that you've used `=` instead of `==`.
#> ℹ Did you mean `big_cyl == c(TRUE)`?