tidymodels / parsnip

A tidy unified interface to models
https://parsnip.tidymodels.org
Other
590 stars 88 forks source link

Allow successes/failures matrix as response for `logistic_regression()` #266

Open juliasilge opened 4 years ago

juliasilge commented 4 years ago

In situations where users need to model an outcome that is a proportion (such as clicks out of impressions, registrations out of visits, etc) a useful approach is to use a generalized linear model with family = binomial (i.e. just like parsnip::logistic_regression()) but in the formula, instead of a factor, specify the response as a "two-column matrix with the columns giving the numbers of successes and failures", according to the docs.

We don't currently support this. This is what it looks like using the underlying glm() function, and this is the error we currently get trying to use parsnip:

library(Sleuth3)

glm(cbind(Extinct, AtRisk - Extinct) ~ log(Area), 
    family = binomial(), data = case2101)
#> 
#> Call:  glm(formula = cbind(Extinct, AtRisk - Extinct) ~ log(Area), family = binomial(), 
#>     data = case2101)
#> 
#> Coefficients:
#> (Intercept)    log(Area)  
#>     -1.1962      -0.2971  
#> 
#> Degrees of Freedom: 17 Total (i.e. Null);  16 Residual
#> Null Deviance:       45.34 
#> Residual Deviance: 12.06     AIC: 75.39

library(parsnip)

logistic_reg() %>%
    set_engine("glm") %>%
    fit(cbind(Extinct, AtRisk - Extinct) ~ log(Area),
        data = case2101)
#> Error: For classification models, the outcome should be a factor.

Created on 2020-02-24 by the reprex package (v0.3.0)

llendway commented 11 months ago

Agreed that this would be helpful! An example of where I might use this is if I wanted to model the probability a member/customer takes a certain action each month for the next 12 months. Using the binomial framework above, this gives me the probability for each month (assuming independence), and I can also estimate the number of months they will take an action. Without this option, I can model the probability that they will take an action in at least one of the months, but that doesn't quite get me what I'd need. I hope that explanation helps.