tidyverse / tidyr

Tidy Messy Data
https://tidyr.tidyverse.org/
Other
1.38k stars 418 forks source link

`crossing()` adds missing factor levels (either a bug or a documentation issue) #1526

Open billdenney opened 11 months ago

billdenney commented 11 months ago

When using crossing() with a factor that does not have all levels present, it inserts the missing levels. My guess is that this is an intentional feature and the documentation should clarify it. When I was using crossing() the expansion of missing levels was unexpected since it added values that were not in the combination of inputs.

The observed behavior is below, and I expected the second behavior in both cases.

I think that the documentation clarification that would have made the behavior expected would be the following:

Here, add "and the nesting() and crossing() helpers" after "complete()":

https://github.com/tidyverse/tidyr/blob/ad628413265c343bd7920b4be9b1ac35529741a9/R/expand.R#L36-L38

And, in the expand_grid() documentation, point back to the fact that it uses expand(). Perhaps adding something like the following to the details section of the docs of expand_grid(): "expand_grid() uses expand() to generate all combinations.

library(tidyr)

# factor levels are expanded
crossing(a = c("A", "B"), b = factor(c("a", "b"), levels = c("a", "b", "c")))
#> # A tibble: 6 × 2
#>   a     b    
#>   <chr> <fct>
#> 1 A     a    
#> 2 A     b    
#> 3 A     c    
#> 4 B     a    
#> 5 B     b    
#> 6 B     c
# factor levels are not expanded
crossing(
  a = c("A", "B"),
  data.frame(b = factor(c("a", "b"), levels = c("a", "b", "c")))
)
#> # A tibble: 4 × 2
#>   a     b    
#>   <chr> <fct>
#> 1 A     a    
#> 2 A     b    
#> 3 B     a    
#> 4 B     b

Created on 2023-10-28 with reprex v2.0.2

If helpful, I can make the documentation PR.