Open sfirke opened 7 years ago
dplyr::recode
comes close for some smaller applications, but doesn't seem to allow for supplying two vectors from other sources (say, an existing crosswalk table).
I also sorely miss plyr::revalue()
. I miss it more than mapvalues()
.
The recode()
API forces you to provide the this = that
via ...
. But that doesn't scale well and I often have this information in a named vector because I need it elsewhere. I am fiddling with the lift_*()
functions in purrr, but it feels harder than it should be.
revalue(x, replace = NULL, warn_missing = TRUE)
x factor or character vector to modify
replace named character vector, with new values as values, and old values as names.
warn_missing print a message if any of the old values are not actually present in x
versus
recode(.x, ..., .default = NULL, .missing = NULL)
.x A vector to modify
... Replacments. These should be named for character and factor .x, and can be named for numeric .x.
All replacements must be the same type, and must have either length one or the same length as x.
I like .default
and .missing
from recode()
and replace
from revalue()
. I wish plyr::revalue()
and dplyr::recode()
would get together and have a baby 👼.
:+1:
other alternatives I've tried using modern tidyverse functions feel clunky:
a) do.call(recode, list(.x = val, named_new_vals))
or
b) factor(val) %>% forcats::fct_recode(named_new_vals) %>% as.character()
My replacement for loading plyr is to use the following lines:
mapvalues <- plyr::mapvalues
revalue <- plyr::revalue
Still have to have plyr installed but at least I'm not introducing naming conflicts.
An alternative approach could be the use of case_when
.
@jennybc I think revalue(x, y)
might be equivalent to recode(x, !!!y)
?
I haven't done an in-depth exploration, but, at least superficially, yes?
revalue <- plyr::revalue
recode <- dplyr::recode
mths <- c("January", "April", "July", "October")
(ssns <- c("winter", "spring", "summer", "fall"))
#> [1] "winter" "spring" "summer" "fall"
(nv <- setNames(ssns, mths))
#> January April July October
#> "winter" "spring" "summer" "fall"
revalue(month.name, replace = nv)
#> [1] "winter" "February" "March" "spring" "May"
#> [6] "June" "summer" "August" "September" "fall"
#> [11] "November" "December"
recode(month.name, !!!nv)
#> [1] "winter" "February" "March" "spring" "May"
#> [6] "June" "summer" "August" "September" "fall"
#> [11] "November" "December"
Created on 2018-09-24 by the reprex package (v0.2.1)
I just wanted to add in here in case it is helpful, that plyr::revalue
appears to be somewhat faster than dplyr::recode
- but perhaps the scale of this difference is not of practical importance:
revalue <- plyr::revalue
recode <- dplyr::recode
months <- c("January", "April", "July", "October")
seasons <- c("winter", "spring", "summer", "fall")
months_as_seasons <- setNames(seasons, months)
library(bench)
bm1 <- mark(
revalue = revalue(month.name, replace = months_as_seasons),
recode = recode(month.name, !!!months_as_seasons)
)
library(ggplot2)
autoplot(bm1)
bm1
#> # A tibble: 2 x 10
#> expression min mean median max `itr/sec` mem_alloc n_gc n_itr
#> <chr> <bch:> <bch:> <bch:> <bch> <dbl> <bch:byt> <dbl> <int>
#> 1 revalue 41.2µs 53µs 48µs 263µs 18871. 23.4KB 8 8497
#> 2 recode 53.3µs 67.2µs 62.9µs 655µs 14878. 104.1KB 11 7005
#> # ... with 1 more variable: total_time <bch:tm>
summary(bm1, relative = TRUE)
#> # A tibble: 2 x 10
#> expression min mean median max `itr/sec` mem_alloc n_gc n_itr
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 revalue 1 1 1 1 1.27 1 1 1.21
#> 2 recode 1.29 1.27 1.31 2.49 1 4.46 1.38 1
#> # ... with 1 more variable: total_time <dbl>
Created on 2018-09-26 by the reprex package (v0.2.0.9000).
I'm confused, do we have an equivalent to plyr::mapvalues() in tidyverse or not?
mapvalues()
is very useful. I use it often. And I don't know of a good replacement.As a diehard tidyverse user, this gets awkward; there are lots of posts about headaches from incorrectly loading plyr and dplyr together, and
mapvalues
currently stands officially outside the tidyverse aslibrary(tidyverse)
does not get you access to that function.As plyr is slowly fading out and has been replaced by dplyr, increasingly more people will find it clunky to call that one great function from an otherwise deprecated package.
Would vctrs be the place for
mapvalues
, or a similar function, in the tidyverse?