tidyverse / forcats

🐈🐈🐈🐈: tools for working with categorical variables (factors)
https://forcats.tidyverse.org/
Other
555 stars 127 forks source link

`fct_collapse()` should allow `other_level = NA` #291

Closed 1pakch closed 1 year ago

1pakch commented 2 years ago

IMO the fct_collapse() function should support using NA as the other_level argument.

As of 0.5.0 one cannot use neither NA to NA_character_

library(purrr)
fac <- c(1, 2, 3, 4) %>% factor()
forcats::fct_collapse(fac, A = c('1', '2'), other_level = NA_character_)
forcats::fct_collapse(fac, A = c('1', '2'), other_level = NA)

and the workaround is quite awkward and too long

(
    fac
    %>% forcats::fct_collapse(A = c('1', '2'), other_level = 'NA')
    %>% na_if('NA')
    %>% fct_drop()
)

I'd be happy to look into it and make a PR. This discussion might be relevant https://github.com/tidyverse/forcats/pull/101

xxspurs commented 2 years ago

您好,我已收到您的邮件,将尽快给您回复。

hadley commented 2 years ago

Reprex:

library(forcats)
x <- factor(1:4)
fct_collapse(x, A = c("1", "2"), other_level = "other")
#> [1] A     A     other other
#> Levels: A other
fct_collapse(x, A = c("1", "2"), other_level = NA)
#> Error in new[[other_level]] <- levels[!levels %in% levs]: attempt to select less than one element in integerOneIndex

Created on 2022-03-02 by the reprex package (v2.0.1)

hadley commented 1 year ago

If we support this, also need to make sure it works in fct_lump_*() and fct_other().

fct_collapse() will need a decent amount of work to achieve this since it currently relies on fct_recode(). So maybe it make more sense to add a new fct_implicit_na() specifically for this purpose?

hadley commented 1 year ago

Hmmm, rather surprisingly you can already do what you want with this:

library(forcats)
x <- factor(1:4)
fct_collapse(x, A = c("1", "2"), other_level = "NULL")
#> [1] A    A    <NA> <NA>
#> Levels: A

Created on 2023-01-03 with reprex v2.0.2