tidyverse / funs

Collection of low-level functions for working with vctrs
Other
34 stars 7 forks source link

Consider `case_when()` variant that uses values #60

Closed hadley closed 2 years ago

hadley commented 3 years ago

Takes a first argument, then LHS of formula are values to match

something_when(x, 
  c("a", "e", "i", "o", "u") ~ "vowel"
  ??? ~ "consonant",
)

Not clear what placeholder we'd use instead of TRUE

e.g. https://github.com/tidyverse/dplyr/issues/5811

romainfrancois commented 3 years ago

Is this related to recode() ? Can ??? be emptyness ?

sketchy implementation on top of dplyr::case_when() just to illustrate. That's probably not how this would be implemented:

library(dplyr, warn.conflicts = FALSE)
library(rlang)

recode_when <- function(x, ...) {
  fs <- dplyr:::compact_null(list2(...))
  fs <- purrr::map(fs, function(.f) {
    if (is_null(f_lhs(.f))) {
      f_lhs(.f) <- TRUE
    } else {
      .f <- new_formula(
        expr(`...x` %in% !!f_lhs(.f)), 
        f_rhs(.f), 
        env = env(f_env(.f), ...x = x)
      )
    }
    .f
  })
  case_when(!!!fs)
}

tibble(
  alphabet = letters,
  type     = recode_when(
    alphabet, 
    c("a", "e", "i", "o", "u") ~ "vowel",
                               ~ "consonant"
  )
)
#> # A tibble: 26 x 2
#>    alphabet type     
#>    <chr>    <chr>    
#>  1 a        vowel    
#>  2 b        consonant
#>  3 c        consonant
#>  4 d        consonant
#>  5 e        vowel    
#>  6 f        consonant
#>  7 g        consonant
#>  8 h        consonant
#>  9 i        vowel    
#> 10 j        consonant
#> # … with 16 more rows

Created on 2021-04-30 by the reprex package (v2.0.0)

hadley commented 3 years ago

This might be a resolution of the recode() API weirdness (i.e. in recode() the new element should be on the left, not the right). Here, the connection to case_when() is stronger so it makes more sense for the condition to be on the left.

The other option for the "default" value would be an named argument:

alphabet %>% recode_when(
  c("a", "e", "i", "o", "u") ~ "vowel",
  NA ~ "missing",
  default = "consonant",
)

Or we could have some magic sentinel:

alphabet %>% recode_when(
  c("a", "e", "i", "o", "u") ~ "vowel",
  default ~ "consonant"
)

(which we could also use in case_when())

Also need to decide what happens if default is not set? Just leave existing values as is?

All left-hand sides and first argument need to have common type. Then use vec_in().

Maybe call it case_value()/case_in()? (case_when() could be case_if()). Worth looking to see how SQL calls these different variants — MS SQL calls them simple and searched.

lionel- commented 3 years ago

Related: #29. With this implementation: https://github.com/lionel-/recode/blob/master/R/recode.R

The idea was to use tibbles to define pairs of values to make it easier to reuse. Users often struggle to define case_when() formulas programmatically or reuse them and tibbles should be more flexible. The recoding tibbles can be defined at the start of a script before data cleanup routines. I like that this serves as a preamble documenting the bare state of the data as well as the expected state after cleanup.

DavisVaughan commented 3 years ago

One generic option is to make the value matching into a logical condition with %in%

case_when( 
  x %in% c("a", "e", "i", "o", "u"), "vowel",
  default = "consonant"
)