tidyverse / forcats

🐈🐈🐈🐈: tools for working with categorical variables (factors)
https://forcats.tidyverse.org/
Other
555 stars 127 forks source link

Wrapper for stringr functions that keeps factor levels. #286

Closed orgadish closed 2 years ago

orgadish commented 3 years ago

This is related to stringr issue #394. I'm not sure if this is best handled in forcats or stringr.

I often have factor data that I want to rework slightly for a plot. When I use something like str_to_upper or str_replace to do so, I lose the factor levels. However, this should be pretty straightforward to do because you just want to apply the function to the levels and the values.

I created a wrapper function for myself fct_stringr to do this and I think it could be useful for others.

library(tidyverse)

iris_with_reordered_levels <- iris %>% 
  mutate(across(Species, fct_relevel, "virginica"))

iris_with_reordered_levels %>% 
  pull(Species) %>% 
  levels    
#> [1] "virginica"  "setosa"     "versicolor"

iris_with_reordered_levels %>% 
  mutate(across(Species, str_to_upper)) %>% 
  pull(Species) %>% 
  levels    
#> NULL

fct_stringr <- function(.f, .fn, ...) {
  new_levels <- .fn(levels(.f), ...)
  new_str <- .fn(.f, ...)
  factor(new_str, levels=new_levels)
}

iris_with_reordered_levels %>% 
  mutate(across(Species, fct_stringr, str_to_upper)) %>% 
  pull(Species) %>% 
  levels
#> [1] "VIRGINICA"  "SETOSA"     "VERSICOLOR"

Created on 2021-09-30 by the reprex package (v2.0.1)

orgadish commented 2 years ago

I had created a similar issue in stringr to suggest something like this and @hadley made the excellent point that forcats::fct_relabel already does this:

forcats::fct_relabel(head(iris$Species), stringr::str_to_upper)
#> [1] SETOSA SETOSA SETOSA SETOSA SETOSA SETOSA
#> Levels: SETOSA VERSICOLOR VIRGINICA

Created on 2021-11-29 by the reprex package (v2.0.1)

As such this feature is already developed! Though it might be useful at some point to add this use case more explicitly to the help or vignette, as it's something I come across often.