markfairbanks / tidytable

Tidy interface to 'data.table'
https://markfairbanks.github.io/tidytable/
Other
450 stars 32 forks source link

Update internal `check_across` #739

Closed marianschmidt closed 1 year ago

marianschmidt commented 1 year ago

The selection logic of columns within dplyr::distinct() has been updated, so that it currently throws an error (and gives an unexpected result) if selection helpers are used without pick(). In the current tidytable version however, the internal check_across() function throws an error, if I use tidytable::pick().

#reprex tidytable distinct pick

library(tidyverse, warn.conflicts = FALSE)
library(tidytable, warn.conflicts = FALSE)
#> As of tidytable v0.9.0 dotless versions of functions are exported.
#> You can now use `arrange()`/`mutate()`/etc. directly.
#> Warning: tidytable was loaded after dplyr.
#> This can lead to most dplyr functions being overwritten by tidytable functions.
#> Warning: tidytable was loaded after tidyr.
#> This can lead to most tidyr functions being overwritten by tidytable functions.

iris %>%
  tidytable::distinct(tidyselect::all_of("Species"))
#> # A tidytable: 3 × 1
#>   Species   
#>   <fct>     
#> 1 setosa    
#> 2 versicolor
#> 3 virginica

iris %>%
  dplyr::distinct(tidyselect::all_of("Species"))
#> Warning: There was 1 warning in `dplyr::distinct()`.
#> ℹ In argument: `tidyselect::all_of("Species")`.
#> Caused by warning:
#> ! Using `all_of()` outside of a selecting function was deprecated in tidyselect
#>   1.2.0.
#> ℹ See details at
#>   <https://tidyselect.r-lib.org/reference/faq-selection-context.html>
#>   tidyselect::all_of("Species")
#> 1                       Species

iris %>%
  tidytable::distinct(tidytable::pick(tidyselect::all_of("Species")))
#> Error in `check_across()`:
#> ! `across()`/`pick()` are unnecessary in `distinct()`.
#> Please directly use tidyselect.
#> Ex: df %>% distinct(where(is.numeric))

#> Backtrace:
#>     ▆
#>  1. ├─iris %>% ...
#>  2. └─tidytable::distinct(., tidytable::pick(tidyselect::all_of("Species")))
#>  3.   ├─tidytable::distinct.(.df, ..., .keep_all = .keep_all)
#>  4.   └─tidytable:::distinct..data.frame(.df, ..., .keep_all = .keep_all)
#>  5.     └─tidytable::distinct(.df, ..., .keep_all = .keep_all)
#>  6.       ├─tidytable::distinct.(.df, ..., .keep_all = .keep_all)
#>  7.       └─tidytable:::distinct..tidytable(.df, ..., .keep_all = .keep_all)
#>  8.         └─tidytable:::check_across(dots, "distinct")
#>  9.           └─rlang::abort(glue("`across()`/`pick()` are unnecessary in `{.fn}()`.\n         Please directly use tidyselect.\n         Ex: df %>% {.fn}(where(is.numeric))"))

iris %>%
  dplyr::distinct(dplyr::pick(tidyselect::all_of("Species")))
#>      Species
#> 1     setosa
#> 2 versicolor
#> 3  virginica

Created on 2023-03-01 with reprex v2.0.2

marianschmidt commented 1 year ago

Surprisingly I get the warning "Using all_of() outside of a selecting function was deprecated in tidyselect" even if I use tidyselect::distinct(). I use the function inside my package. So the workaround to leave it as it is (tidytable::distinct(tidyselect::all_of("Species"))) does create the above warning for me.

markfairbanks commented 1 year ago

The warning is letting you know what you should do in tidytable - which is use tidyselect directly and not use pick() or across(). This is a difference between dplyr and tidytable that has existed since the early days of tidytable and can't be changed.

tidytable

dplyr

library(magrittr)

# tidytable - no `pick()` required
iris %>%
  tidytable::distinct(tidyselect::all_of("Species"))
#> # A tidytable: 3 × 1
#>   Species   
#>   <fct>     
#> 1 setosa    
#> 2 versicolor
#> 3 virginica

# dplyr - requires `pick()`
iris %>%
  dplyr::distinct(dplyr::pick(tidyselect::all_of("Species")))
#>      Species
#> 1     setosa
#> 2 versicolor
#> 3  virginica

Edit: This is what I mean by "mutate semantics" - this works in dplyr but fails in tidytable. This example is why dplyr needs across()/pick() to use tidyselect in distinct().

I'm not sure if most people know that dplyr::distinct() isn't using selection - it's creating new columns using mutate/transmute.

library(dplyr, warn.conflicts = FALSE)

as_tibble(iris) %>%
  distinct(id = row_number(), Species)
#> # A tibble: 150 × 2
#>       id Species
#>    <int> <fct>  
#>  1     1 setosa 
#>  2     2 setosa 
#>  3     3 setosa 
#>  4     4 setosa 
#>  5     5 setosa 
#>  6     6 setosa 
#>  7     7 setosa 
#>  8     8 setosa 
#>  9     9 setosa 
#> 10    10 setosa 
#> # … with 140 more rows
marianschmidt commented 1 year ago

So when using

iris %>%
  tidytable::distinct(tidyselect::all_of("Species"))

would you expect the tidyselect warning in some scenarios? I still need to replicate in a simple example why I suddenly got the warning, although I have not changed the script.

markfairbanks commented 1 year ago

No - there shouldn't be any tidyselect warning when using tidytable::distinct().

Edit: If you wouldn't mind keeping me updated if you do figure it out. I know tidyselect recently updated all_of() (see tidyselect NEWS). But this shouldn't affect tidytable::distinct().

* Using `all_of()` outside of a tidyselect context is now deprecated (#269).
  In the future it will error to be consistent with `any_of()`.