tidyverse / dplyr

dplyr: A grammar of data manipulation
https://dplyr.tidyverse.org/
Other
4.75k stars 2.12k forks source link

Prefixing certain functions inside `filter` with package name yields unexpected errors. #2930

Closed ghost closed 6 years ago

ghost commented 7 years ago

Issue: Prefixing certain functions inside dplyr::filter with <package_name>:: yields unexpected errors, but omitting the prefix results in no error. I would not expect functions inside packages to behave differently depending on whether or not the function name is prefixed by <package_name>::. (When building packages, I like to prefix, to make explicit where functions are coming from.)

Reference: https://stackoverflow.com/questions/44712189/dplyr-0-7-0-tidyeval-in-packages/44714143)

Example: Suppose we have this data frame:

> x <- tibble::tibble(v = 1:3, w = 2)

This works:

> y <- dplyr::filter(x, .data$v > .data$w)

This yields an error:

> y <- dplyr::filter(x, rlang::.data$v > rlang::.data$w)

Error in filter_impl(.data, quo) : 
  Evaluation error: Object `v` not found in data.

The nature of this issue is similar to the use of n(). For example, this works:

> z <- dplyr::filter(x, v == n())

This yields an error:

> z <- dplyr::filter(x, v == dplyr::n())

Error in filter_impl(.data, quo) : 
  Evaluation error: This function should not be called directly.

The cause of the error with dplyr::n() is made clearer when the function is viewed in R:

> dplyr:::n
function () 
{
    abort("This function should not be called directly")
}
<environment: namespace:dplyr>

However, I'm not sure what's causing the first error.

krlmlr commented 7 years ago

Thanks. The behavior you are seeing is expected, because n() and .data are evaluated with slightly different semantics in the hybrid evaluator than with standard evaluation. We could make the error more explicit for .data, though.

lionel- commented 7 years ago

Ideally prefixing with a namespace should work.

hadley commented 7 years ago

Ideally, but this seems fairly low priority to me.

infotroph commented 6 years ago

I'm not certain this is the same issue, but seems closely related: If I extend the examples above so that x is a database table, .data does not work even without the namespace.

x <- tibble::tibble(v = 1:3, w = 2)
db = DBI::dbConnect(RSQLite::SQLite(), path = ":memory:")
dplyr::copy_to(db, x, "x")
x2 <- dplyr::tbl(db, "x")

y <- dplyr::filter(x, .data$v > .data$w) # as above, succeeds
y2 <- dplyr::filter(x2, .data$v > .data$w)
#> Error in eval_bare(call, env): object '.data' not found

y <- dplyr::filter(x, rlang::.data$v > rlang::.data$w) # as above, still fails
#> Error in filter_impl(.data, quo): Evaluation error: Column `v` not found in `.data`.
y2 <- dplyr::filter(x2, rlang::.data$v > rlang::.data$w)
#> Error: Column `v` not found in `.data`

Created on 2018-01-28 by the reprex package (v0.1.1.9000).

Longer version with session info: https://gist.github.com/infotroph/19513e684c97b576e24c8b1058b082ee

romainfrancois commented 6 years ago

This is mostly fixed as part of #3456.

> dplyr::filter(x, v == dplyr::n())
# A tibble: 1 x 2
      v     w
  <int> <dbl>
1     3    2.

.data is different, but I'm not sure why you'd want to prefix it with rlang.

ghost commented 6 years ago

Hi @romainfrancois ,

In package development, I generally prefer to prefix function names with the package name and ::, rather than using @importFrom. Furthermore, on 23 March 2014, @hadley said that this was his preference (see his first comment to the accepted answer of the stack overflow thread at https://stackoverflow.com/questions/8597993/does-roxygen2-automatically-write-namespace-directives-for-imports-packages). Of course, given that his comment is over four years old, he may have revised his advice more recently.

In summary, if there needs to be an exception for rlang::.data for a good reason, then I'm happy to accept that.

hadley commented 6 years ago

.data is not a function call - it's syntax that's part of tidy evaluation. rlang::.data is only provided to allow you to suppress R CMD check warnings; it doesn't actually do anything.

lionel- commented 6 years ago

This is similar to the magrittr pronoun:

# Good
x %>% list(., .)

# oh no!
x %>% list(magrittr::., magrittr::.)
romainfrancois commented 6 years ago

So do we need to do something about rlang::.data or can we close this ?

These will quiet R CMD check:

globalVariables(".data") 

or:

#' @importFrom rlang .data
lionel- commented 6 years ago

I think we can close, .data should not be prefixed as it's a pronoun that lives inside the data mask with the columns of the data frame, it does not actually live in rlang.

By the way, prefixing uq operators is going to be deprecated so please don't write rlang::UQ(x) but UQ(x). Or better yet, !!x ;). The rationale is that unquote operators are not (and couldn't be) function calls. For the same reason we are ambivalent about keeping UQ() at all because it overloads the syntax of function calls with something completely different.

lock[bot] commented 5 years ago

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/