Closed Torvaney closed 2 years ago
This seems to come up quite a bit (at least in my wife's work) when you use a case_when()
to bucket data in preparation for creation of a graph or gt table where the order of the levels matters, so I agree that some variant of this would be useful. She basically uses case_when() %>% factor(levels = )
as a more readable version of cut()
Worth mentioning that you can use .default
and .ptype
in the dev version of dplyr to help with this a little bit
dplyr::case_when(
letters %in% c("a", "e", "i", "o", "u") ~ "vowel",
.default = "consonant",
.ptype = factor(levels = c("vowel", "consonant"), ordered = TRUE)
)
#> [1] vowel consonant consonant consonant vowel consonant consonant
#> [8] consonant vowel consonant consonant consonant consonant consonant
#> [15] vowel consonant consonant consonant consonant consonant vowel
#> [22] consonant consonant consonant consonant consonant
#> Levels: vowel < consonant
But it would still probably be nice to have a forcats specific helper for this which flipped the order of the inputs, like:
fct_case_when(
vowel = letters %in% c("a", "e", "i", "o", "u"),
.default = "consonant",
.ordered = TRUE
)
Actually I'm convinced this is just a duplicate of https://github.com/tidyverse/forcats/issues/298 and will eventually live in forcats, so im going to close this one
I frequently find myself using the following pattern with case_when:
In other words, using
case_when
to create a factor. In most cases, it is trivial to order the case expressions to match the desired factor order (in general, I think it's a little more readable to keep them in factor order anyway).Since the factor order is already (implicitly) defined in the order of the case expressions, there is some redundancy in this pattern. Consequently, I think it would be useful to provide a version of
case_when
that returns a factor with the level order defined by the expression order. I think the most natural ways to add this functionality would be with a new function, or by adding an argument tocase_when
, but I don't have a strong feeling about this.This would be particularly useful during data exploration, where many different ordered groupings may be tried out with
case_when
; having to change the factor levels in multiple places quickly becomes tedious and error-prone.This use-case is common enough to have been brought up independently on stackoverflow: https://stackoverflow.com/questions/49572416/r-convert-to-factor-with-order-of-levels-same-with-case-when
For clarity, here is a simplified demo implementation I have been using:
Created on 2021-09-24 by the reprex package (v0.3.0)
An alternative forcats-inspired API is also posted here.
This was originally posted as a feature request to dplyr (tidyverse/dplyr#6029) and has been re-posted here (per the request in the linked thread).