Closed hadley closed 2 years ago
Is this related to recode()
? Can ???
be emptyness ?
sketchy implementation on top of dplyr::case_when()
just to illustrate. That's probably not how this would be implemented:
library(dplyr, warn.conflicts = FALSE)
library(rlang)
recode_when <- function(x, ...) {
fs <- dplyr:::compact_null(list2(...))
fs <- purrr::map(fs, function(.f) {
if (is_null(f_lhs(.f))) {
f_lhs(.f) <- TRUE
} else {
.f <- new_formula(
expr(`...x` %in% !!f_lhs(.f)),
f_rhs(.f),
env = env(f_env(.f), ...x = x)
)
}
.f
})
case_when(!!!fs)
}
tibble(
alphabet = letters,
type = recode_when(
alphabet,
c("a", "e", "i", "o", "u") ~ "vowel",
~ "consonant"
)
)
#> # A tibble: 26 x 2
#> alphabet type
#> <chr> <chr>
#> 1 a vowel
#> 2 b consonant
#> 3 c consonant
#> 4 d consonant
#> 5 e vowel
#> 6 f consonant
#> 7 g consonant
#> 8 h consonant
#> 9 i vowel
#> 10 j consonant
#> # … with 16 more rows
Created on 2021-04-30 by the reprex package (v2.0.0)
This might be a resolution of the recode()
API weirdness (i.e. in recode()
the new element should be on the left, not the right). Here, the connection to case_when()
is stronger so it makes more sense for the condition to be on the left.
The other option for the "default" value would be an named argument:
alphabet %>% recode_when(
c("a", "e", "i", "o", "u") ~ "vowel",
NA ~ "missing",
default = "consonant",
)
Or we could have some magic sentinel:
alphabet %>% recode_when(
c("a", "e", "i", "o", "u") ~ "vowel",
default ~ "consonant"
)
(which we could also use in case_when()
)
Also need to decide what happens if default
is not set? Just leave existing values as is?
All left-hand sides and first argument need to have common type. Then use vec_in()
.
Maybe call it case_value()
/case_in()
? (case_when()
could be case_if()
). Worth looking to see how SQL calls these different variants — MS SQL calls them simple and searched.
Related: #29. With this implementation: https://github.com/lionel-/recode/blob/master/R/recode.R
The idea was to use tibbles to define pairs of values to make it easier to reuse. Users often struggle to define case_when()
formulas programmatically or reuse them and tibbles should be more flexible. The recoding tibbles can be defined at the start of a script before data cleanup routines. I like that this serves as a preamble documenting the bare state of the data as well as the expected state after cleanup.
One generic option is to make the value matching into a logical condition with %in%
case_when(
x %in% c("a", "e", "i", "o", "u"), "vowel",
default = "consonant"
)
Takes a first argument, then LHS of formula are values to match
Not clear what placeholder we'd use instead of
TRUE
e.g. https://github.com/tidyverse/dplyr/issues/5811