tidyverse / funs

Collection of low-level functions for working with vctrs
Other
34 stars 7 forks source link

migrate plyr::mapvalues() to vctrs? #15

Open sfirke opened 7 years ago

sfirke commented 7 years ago

mapvalues() is very useful. I use it often. And I don't know of a good replacement.

As a diehard tidyverse user, this gets awkward; there are lots of posts about headaches from incorrectly loading plyr and dplyr together, and mapvalues currently stands officially outside the tidyverse as library(tidyverse) does not get you access to that function.

As plyr is slowly fading out and has been replaced by dplyr, increasingly more people will find it clunky to call that one great function from an otherwise deprecated package.

Would vctrs be the place for mapvalues, or a similar function, in the tidyverse?

sfirke commented 7 years ago

dplyr::recode comes close for some smaller applications, but doesn't seem to allow for supplying two vectors from other sources (say, an existing crosswalk table).

jennybc commented 7 years ago

I also sorely miss plyr::revalue(). I miss it more than mapvalues().

The recode() API forces you to provide the this = that via .... But that doesn't scale well and I often have this information in a named vector because I need it elsewhere. I am fiddling with the lift_*() functions in purrr, but it feels harder than it should be.

revalue(x, replace = NULL, warn_missing = TRUE)

x            factor or character vector to modify
replace      named character vector, with new values as values, and old values as names.
warn_missing print a message if any of the old values are not actually present in x

versus

recode(.x, ..., .default = NULL, .missing = NULL)

.x A vector to modify
... Replacments. These should be named for character and factor .x, and can be named for numeric .x.
All replacements must be the same type, and must have either length one or the same length as x.

I like .default and .missing from recode() and replace from revalue(). I wish plyr::revalue() and dplyr::recode() would get together and have a baby 👼.

t-kalinowski commented 7 years ago

:+1: other alternatives I've tried using modern tidyverse functions feel clunky: a) do.call(recode, list(.x = val, named_new_vals)) or b) factor(val) %>% forcats::fct_recode(named_new_vals) %>% as.character()

samclifford commented 7 years ago

My replacement for loading plyr is to use the following lines:

mapvalues <- plyr::mapvalues revalue <- plyr::revalue

Still have to have plyr installed but at least I'm not introducing naming conflicts.

An alternative approach could be the use of case_when.

hadley commented 5 years ago

@jennybc I think revalue(x, y) might be equivalent to recode(x, !!!y)?

jennybc commented 5 years ago

I haven't done an in-depth exploration, but, at least superficially, yes?

revalue <- plyr::revalue
recode <- dplyr::recode

mths <- c("January", "April", "July", "October")
(ssns <- c("winter", "spring", "summer", "fall"))
#> [1] "winter" "spring" "summer" "fall"
(nv <- setNames(ssns, mths))
#>  January    April     July  October 
#> "winter" "spring" "summer"   "fall"

revalue(month.name, replace = nv)
#>  [1] "winter"    "February"  "March"     "spring"    "May"      
#>  [6] "June"      "summer"    "August"    "September" "fall"     
#> [11] "November"  "December"

recode(month.name, !!!nv)
#>  [1] "winter"    "February"  "March"     "spring"    "May"      
#>  [6] "June"      "summer"    "August"    "September" "fall"     
#> [11] "November"  "December"

Created on 2018-09-24 by the reprex package (v0.2.1)

njtierney commented 5 years ago

I just wanted to add in here in case it is helpful, that plyr::revalue appears to be somewhat faster than dplyr::recode - but perhaps the scale of this difference is not of practical importance:

revalue <- plyr::revalue
recode <- dplyr::recode

months <- c("January", "April", "July", "October")
seasons <- c("winter", "spring", "summer", "fall")
months_as_seasons <- setNames(seasons, months)

library(bench)
bm1 <- mark(
  revalue = revalue(month.name, replace = months_as_seasons),
  recode = recode(month.name, !!!months_as_seasons)
)

library(ggplot2)
autoplot(bm1)

bm1
#> # A tibble: 2 x 10
#>   expression    min   mean median   max `itr/sec` mem_alloc  n_gc n_itr
#>   <chr>      <bch:> <bch:> <bch:> <bch>     <dbl> <bch:byt> <dbl> <int>
#> 1 revalue    41.2µs   53µs   48µs 263µs    18871.    23.4KB     8  8497
#> 2 recode     53.3µs 67.2µs 62.9µs 655µs    14878.   104.1KB    11  7005
#> # ... with 1 more variable: total_time <bch:tm>
summary(bm1, relative = TRUE)
#> # A tibble: 2 x 10
#>   expression   min  mean median   max `itr/sec` mem_alloc  n_gc n_itr
#>   <chr>      <dbl> <dbl>  <dbl> <dbl>     <dbl>     <dbl> <dbl> <dbl>
#> 1 revalue     1     1      1     1         1.27      1     1     1.21
#> 2 recode      1.29  1.27   1.31  2.49      1         4.46  1.38  1   
#> # ... with 1 more variable: total_time <dbl>

Created on 2018-09-26 by the reprex package (v0.2.0.9000).

aloboa commented 1 year ago

I'm confused, do we have an equivalent to plyr::mapvalues() in tidyverse or not?