tidymodels / probably

Tools for post-processing class probability estimates
https://probably.tidymodels.org/
Other
115 stars 15 forks source link

Fix by_group processing for factors. #135

Closed topepo closed 7 months ago

topepo commented 7 months ago

Closes #127

purrr::transpose() was converting factors to integers.

simonpcouch commented 7 months ago

Previously, split_dplyr_groups() was able to handle multiple grouping variables:

library(modeldata)
library(dplyr)

penguins_grouped2 <- penguins %>% group_by(species, island)

probably:::split_dplyr_groups(penguins_grouped2)
#> [[1]]
#> [[1]]$data
#> # A tibble: 0 × 7
#> # ℹ 7 variables: species <fct>, island <fct>, bill_length_mm <dbl>,
#> #   bill_depth_mm <dbl>, flipper_length_mm <int>, body_mass_g <int>, sex <fct>
#> 
#> [[1]]$filter
#> species == 1L & island == 1L
#> 
#> [[1]]$rows
#> [1] 0
#> 
#> 
#> [[2]]
#> [[2]]$data
#> # A tibble: 0 × 7
#> # ℹ 7 variables: species <fct>, island <fct>, bill_length_mm <dbl>,
#> #   bill_depth_mm <dbl>, flipper_length_mm <int>, body_mass_g <int>, sex <fct>
#> 
#> [[2]]$filter
#> species == 1L & island == 2L
#> 
#> [[2]]$rows
#> [1] 0
#> 
#> 
#> [[3]]
#> [[3]]$data
#> # A tibble: 0 × 7
#> # ℹ 7 variables: species <fct>, island <fct>, bill_length_mm <dbl>,
#> #   bill_depth_mm <dbl>, flipper_length_mm <int>, body_mass_g <int>, sex <fct>
#> 
#> [[3]]$filter
#> species == 1L & island == 3L
#> 
#> [[3]]$rows
#> [1] 0
#> 
#> 
#> [[4]]
#> [[4]]$data
#> # A tibble: 0 × 7
#> # ℹ 7 variables: species <fct>, island <fct>, bill_length_mm <dbl>,
#> #   bill_depth_mm <dbl>, flipper_length_mm <int>, body_mass_g <int>, sex <fct>
#> 
#> [[4]]$filter
#> species == 2L & island == 2L
#> 
#> [[4]]$rows
#> [1] 0
#> 
#> 
#> [[5]]
#> [[5]]$data
#> # A tibble: 0 × 7
#> # ℹ 7 variables: species <fct>, island <fct>, bill_length_mm <dbl>,
#> #   bill_depth_mm <dbl>, flipper_length_mm <int>, body_mass_g <int>, sex <fct>
#> 
#> [[5]]$filter
#> species == 3L & island == 1L
#> 
#> [[5]]$rows
#> [1] 0

Created on 2024-02-22 with reprex v2.1.0

With this PR, it cannot:

library(modeldata)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

penguins_grouped2 <- penguins %>% group_by(species, island)

probably:::split_dplyr_groups(penguins_grouped2)
#> Error in `rlang::sym()`:
#> ! Can't convert a character vector to a symbol.

Created on 2024-02-22 with reprex v2.1.0

It doesn't like any tests fail as a result of this, but the comments on the function and the reduce() call seem to indicate it ought to, so I'll assume it should.

topepo commented 7 months ago

Thanks. We'll need to put some better error checking in. It is designed to support a single grouping variable. The docs for .by has

The column identifier for the grouping variable. This should be a single unquoted column name that selects a qualitative variable for grouping. Default to NULL. When .by = NULL no grouping will take place.