tidymodels / probably

Tools for post-processing class probability estimates
https://probably.tidymodels.org/
Other
111 stars 12 forks source link

`cal_estimate_*()` with factor variable passed to `.by` fails #127

Closed tonyelhabr closed 4 months ago

tonyelhabr commented 10 months ago

This may or may not be intended behavior, but my expectation is that this would work.

library(probably)
packageVersion("probably")
#> [1] '1.0.1.9000'
data("segment_logistic")
segment_logistic$dummy_group <- c(
  rep("A", 500),
  rep("B", 300),
  rep("C", 210)
)

## 1. works as expected for a character field
cal_estimate_beta(segment_logistic, Class, .by = dummy_group)
#> 
#> ── Probability Calibration
#> Method: Beta calibration
#> Type: Binary
#> Source class: Data Frame
#> Data points: 1,010, split in 3 groups
#> Truth variable: `Class`
#> Estimate variables:
#> `.pred_good` ==> good
#> `.pred_poor` ==> poor

## 2. doesn't work with a factor group?
segment_logistic$dummy_group <- factor(segment_logistic$dummy_group)
cal_estimate_beta(segment_logistic, Class, .by = dummy_group)
#> Error in family$linkfun(mustart): Argument mu must be a nonempty numeric vector

## 3. works for an integer field that is like a pseudo-category
segment_logistic$dummy_group <- as.numeric(segment_logistic$dummy_group)
cal_estimate_beta(segment_logistic, Class, .by = dummy_group)
#> 
#> ── Probability Calibration
#> Method: Beta calibration
#> Type: Binary
#> Source class: Data Frame
#> Data points: 1,010, split in 3 groups
#> Truth variable: `Class`
#> Estimate variables:
#> `.pred_good` ==> good
#> `.pred_poor` ==> poor

I found the same issue with cal_estimate_isotonic(), so I think this is affecting all of the `calestimate*() functions.

AFAICT this is an issue with using split_dplyr_groups() in cal_*_impl_grp()