tidyverse / purrr

A functional programming toolkit for R
https://purrr.tidyverse.org/
Other
1.27k stars 272 forks source link

purrr::map() support for SE/variable column names? #339

Closed OmaymaS closed 7 years ago

OmaymaS commented 7 years ago

I'd like to pass a variable column name to map(), but I am not sure whether it is supported.

Example:

dat <- structure(list(Tags_terms = list(c("Tag1", "Tag2","Tag3"),
                                        c("Tag1"),
                                        c("Tag1"))),
                 .Names = "Tags_terms",
                 row.names = c(NA, -3L),
                 class = c("tbl_df","tbl", "data.frame"))

# A tibble: 3 × 1
Tags_terms
<list>
  1  <chr [3]>
  2  <chr [1]>
  3  <chr [1]>

Instead of the following straightforward way:

dat %>% 
  mutate(tags_count = map_int(Tags_terms, length))

# A tibble: 3 × 2
  Tags_terms tags_count
      <list>      <int>
1  <chr [3]>          3
2  <chr [1]>          1
3  <chr [1]>          1

I'd like to write sth like:

cname <- "Tags_terms"

dat %>% 
  mutate(tags_count = map_int(cname, length))
jennybc commented 7 years ago

The preferred way to program around dplyr now is to use "tidy eval", which comes from the rlang package. However, note that dplyr re-exports the main functions you need. One way to do what you want is to capture "Tags_terms" in a so-called quosure and then unquote or evaluate it inside map_int() via !!:

library(dplyr)
library(purrr)

dat <- tribble(
  ~ Tags_terms,
  c("Tag1", "Tag2","Tag3"),
  "Tag1",
  "Tag1"
)

cname <- quo(Tags_terms)
dat %>% 
  dplyr::mutate(tags_count = map_int(!! cname, length))
#> # A tibble: 3 x 2
#>   Tags_terms tags_count
#>       <list>      <int>
#> 1  <chr [3]>          3
#> 2  <chr [1]>          1
#> 3  <chr [1]>          1

If you really want to specify the column via a string, inside a map() inside mutate(), I believe you need to load rlang explicitly, so you can call syms().

library(rlang)
cname <- syms("Tags_terms")
dat %>% 
  mutate(tags_count = map_int(!!! cname, length))
#> # A tibble: 3 x 2
#>   Tags_terms tags_count
#>       <list>      <int>
#> 1  <chr [3]>          3
#> 2  <chr [1]>          1
#> 3  <chr [1]>          1

I am still working through this transition myself, so I trust that @lionel- will check me here 🙂.

OmaymaS commented 7 years ago

Thanks. I am getting the error shown below. Do I need to update any package or am I missing sth?

library(purrr)
library(tibble)
library(rlang)

dat <- tribble(
  ~ Tags_terms,
  c("Tag1", "Tag2","Tag3"),
  "Tag1",
  "Tag1"
)

cname <- quo(Tags_terms)

dat %>% 
  dplyr::mutate(tags_count = map_int(!! cname, length))

Error: invalid argument type
In addition: Warning message:
failed to assign NativeSymbolInfo for env since env is already defined in the ‘lazyeval’ namespace 
> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rlang_0.1.1  tibble_1.3.0 purrr_0.2.2 

loaded via a namespace (and not attached):
[1] lazyeval_0.2.0 R6_2.1.2       assertthat_0.1 magrittr_1.5   DBI_0.6-1      tools_3.3.1    dplyr_0.5.0   
[8] Rcpp_0.12.6
amarchin commented 7 years ago

I think you need dplyr 0.7.0 to use the new tidyeval framework. Try to update the package.

OmaymaS commented 7 years ago

Thanks. I updated dplyr and this approach works for now. Will follow up to see if @jennybc @lionel- will share any other approaches.

lionel- commented 7 years ago

Sorry I forgot to reply. The preferred way to inline a column name is to unquote a symbol:

cname <- "Tags_terms"
mutate(dat, tags_count = map_int(!! rlang::sym(cname), length))

You can verify that mutate() sees the right expression by wrapping with expr() or quo():

rlang::expr(mutate(dat, tags_count = map_int(!! rlang::sym(cname), length)))
#> mutate(dat, tags_count = map_int(Tags_terms, length))