tidyverse / funs

Collection of low-level functions for working with vctrs
Other
34 stars 7 forks source link

`vec_recode()` #29

Open lionel- opened 5 years ago

lionel- commented 5 years ago

Following https://github.com/lionel-/recode/blob/master/R/recode.R

Specify the mapping of values with a tibble of .new and .old columns (here created with keys(), not sure about this helper). The new column of keys acts as a generalisation of names which preserves the types. Missing values can be encoded in the spec:

keys(0:2, c(4L, 6L, 8L))
#> # A tibble: 3 x 2
#>    .new  .old
#>   <int> <int>
#> 1     0     4
#> 2     1     6
#> 3     2     8

tibble::tribble(
  ~ .new, ~ .old
  0L, 4L,
  1L, 6L,
  2L, 8L
)
#> # A tibble: 3 x 2
#>    .key .value
#>   <int>  <int>
#> 1     0      4
#> 2     1      6
#> 3     2      8

Basic usage:

vec_recode(mtcars$cyl, keys(0:2, c(4L, 6L, 8L)))
#>  [1] 1 1 0 1 2 1 2 0 0 1 1 2 2 2 2 2 2 0 0 0 0 2 2 2 2 0 0 0 2 1 2 0

vec_recode(mtcars$cyl, keys(0:1, c(4L, 6L)))
#>  [1] 1 1 0 1 8 1 8 0 0 1 1 8 8 8 8 8 8 0 0 0 0 8 8 8 8 0 0 0 8 1 8 0

vec_recode(mtcars$cyl, keys(0:1, c(4L, 6L)), default = 1.5)
#>  [1] 1.0 1.0 0.0 1.0 1.5 1.0 1.5 0.0 0.0 1.0 1.0 1.5 1.5 1.5 1.5 1.5 1.5 0.0
#> [19] 0.0 0.0 0.0 1.5 1.5 1.5 1.5 0.0 0.0 0.0 1.5 1.0 1.5 0.0

vec_recode(mtcars$vs, keys(c("zero", "one"), 0:1))
#>  [1] "zero" "zero" "one"  "one"  "zero" "one"  "zero" "one"  "one"  "one"
#> [11] "one"  "zero" "zero" "zero" "zero" "zero" "zero" "one"  "one"  "one"
#> [21] "one"  "zero" "zero" "zero" "zero" "one"  "zero" "one"  "zero" "zero"
#> [31] "zero" "one"

spec <- keys(c("FOO", "missing"), c("foo", NA))
vec_recode( c("foo", "bar", NA, "foo"), spec, default = "default")
#> [1] "FOO"     "default" "missing" "FOO"

# Corresponding dplyr code:
dplyr::recode(mtcars$cyl, `4` = 0, `6` = 1, `8` = 2)
dplyr::recode(mtcars$cyl, `4` = 0, `6` = 1)
dplyr::recode(mtcars$cyl, `4` = 0, `6` = 1, .default = 1.5)
dplyr::recode(mtcars$vs, `0` = "zero", `1` = "one")
dplyr::recode(c("foo", "bar", NA, "foo"), `foo` = "FOO", .default = "default", .missing = "missing")

You can recode multiple values to a same key by supplying a list column in .old:

spec <- tibble::tribble(
  ~ .new, ~ .old,
  0,      c(4, 6),
  1,      8
)
vec_recode(mtcars$cyl, spec)
#>  [1] 0 0 0 0 1 0 1 0 0 0 0 1 1 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 1 0 1 0

You can recode vectors to a tibble:

spec <- tibble::tibble(
  .new = tibble::tibble(
    x = c("foo", "bar"),
    y = c("quux", "foofy")
  ),
  .old = c(4L, 6L)
)
vec_recode(mtcars$cyl, spec, default = tibble::tibble(x = "plop", y = "plip"))
#> # A tibble: 32 x 2
#>    x     y
#>  * <chr> <chr>
#>  1 bar   foofy
#>  2 bar   foofy
#>  3 foo   quux
#>  4 bar   foofy
#>  5 plop  plip
#>  6 bar   foofy
#>  7 plop  plip
#>  8 foo   quux
#>  9 foo   quux
#> 10 bar   foofy
#> # … with 22 more rows

And you can recode tibbles to vectors:

spec <- tibble::tibble(
  .new = c("foo", "bar"),
  .old = tibble::tibble(
    x = c(1L, 2L),
    y = c(TRUE, FALSE)
  )
)
x <- tibble::tibble(x = c(1, 2, 2, 1), y = c(TRUE, TRUE, FALSE, TRUE))
vec_recode(x, spec, default = "baz")
#> [1] "foo" "baz" "bar" "foo"

In a data cleaning scripts, all specs can be neatly kept at the top of the file, then we use mutate() and mapping variants to recode variables one by one or in bulk.

hadley commented 4 years ago

Maybe allow either a pair of new and old arguments or a single new data frame?

hadley commented 4 years ago

Also related to #15

hadley commented 4 years ago

This will also resolve https://github.com/tidyverse/dplyr/issues/4628, since we eliminate the round trip to symbols.

hadley commented 4 years ago

We'll need to come up with a new name for the function so we can deprecate dplyr::recode()

lionel- commented 4 years ago

Alternatively it might be possible to implement the new functionality in recode(), and deprecate the current functionality when we detect named inputs.