tidyverse / funs

Collection of low-level functions for working with vctrs
Other
34 stars 7 forks source link

Consider `case_when()` variant returning ordered factors #72

Closed Torvaney closed 2 years ago

Torvaney commented 3 years ago

I frequently find myself using the following pattern with case_when:

data %>%
  mutate(
    some_col = case_when(
      cond1 ~ "result1",
      cond2 ~ "result2",
      # etc
    ) %>% fct_relevel("result1", "result2", ...etc...)
  )

In other words, using case_when to create a factor. In most cases, it is trivial to order the case expressions to match the desired factor order (in general, I think it's a little more readable to keep them in factor order anyway).

Since the factor order is already (implicitly) defined in the order of the case expressions, there is some redundancy in this pattern. Consequently, I think it would be useful to provide a version of case_when that returns a factor with the level order defined by the expression order. I think the most natural ways to add this functionality would be with a new function, or by adding an argument to case_when, but I don't have a strong feeling about this.

This would be particularly useful during data exploration, where many different ordered groupings may be tried out with case_when; having to change the factor levels in multiple places quickly becomes tedious and error-prone.

This use-case is common enough to have been brought up independently on stackoverflow: https://stackoverflow.com/questions/49572416/r-convert-to-factor-with-order-of-levels-same-with-case-when

For clarity, here is a simplified demo implementation I have been using:

suppressPackageStartupMessages({
  library(dplyr)
  library(purrr)
  library(rlang)
  library(forcats)
})

factored_case_when <- function(...) {
  args <- list2(...)
  rhs <- map(args, f_rhs)

  cases <- case_when(
    !!!args
  )

  exec(fct_relevel, cases, !!!rhs)
}

numbers <- c(2, 7, 4, 3, 8, 9, 3, 5, 2, 7, 5, 4, 1, 9, 8)

factored_case_when(
  numbers <= 2 ~ "Very small",
  numbers <= 3 ~ "Small",
  numbers <= 6 ~ "Medium",
  numbers <= 8 ~ "Large",
  TRUE    ~ "Huge!"
)
#>  [1] Very small Large      Medium     Small      Large      Huge!     
#>  [7] Small      Medium     Very small Large      Medium     Medium    
#> [13] Very small Huge!      Large     
#> Levels: Very small Small Medium Large Huge!

Created on 2021-09-24 by the reprex package (v0.3.0)

An alternative forcats-inspired API is also posted here.


This was originally posted as a feature request to dplyr (tidyverse/dplyr#6029) and has been re-posted here (per the request in the linked thread).

DavisVaughan commented 2 years ago

This seems to come up quite a bit (at least in my wife's work) when you use a case_when() to bucket data in preparation for creation of a graph or gt table where the order of the levels matters, so I agree that some variant of this would be useful. She basically uses case_when() %>% factor(levels = ) as a more readable version of cut()

DavisVaughan commented 2 years ago

Worth mentioning that you can use .default and .ptype in the dev version of dplyr to help with this a little bit

dplyr::case_when(
  letters %in% c("a", "e", "i", "o", "u") ~ "vowel",
  .default = "consonant",
  .ptype = factor(levels = c("vowel", "consonant"), ordered = TRUE)
)
#>  [1] vowel     consonant consonant consonant vowel     consonant consonant
#>  [8] consonant vowel     consonant consonant consonant consonant consonant
#> [15] vowel     consonant consonant consonant consonant consonant vowel    
#> [22] consonant consonant consonant consonant consonant
#> Levels: vowel < consonant

But it would still probably be nice to have a forcats specific helper for this which flipped the order of the inputs, like:

fct_case_when(
  vowel = letters %in% c("a", "e", "i", "o", "u"),
  .default = "consonant",
  .ordered = TRUE
)

See also https://github.com/tidyverse/forcats/issues/298

DavisVaughan commented 2 years ago

Actually I'm convinced this is just a duplicate of https://github.com/tidyverse/forcats/issues/298 and will eventually live in forcats, so im going to close this one