vincentarelbundock / marginaleffects

R package to compute and plot predictions, slopes, marginal means, and comparisons (contrasts, risk ratios, odds, etc.) for over 100 classes of statistical and ML models. Conduct linear and non-linear hypothesis tests, or equivalence tests. Calculate uncertainty estimates using the delta method, bootstrapping, or simulation-based inference
https://marginaleffects.com
Other
392 stars 43 forks source link

Fix for `survreg` objects #1122

Closed ngreifer closed 1 month ago

ngreifer commented 1 month ago

Fixed a bug that would occur with survreg objects from survival::survreg() with dist other than "exponential". Would get a "nonconformable arguments" error when computing predictions because the default method of set_coef() add an extra coefficient where it didn't belong. Updated the NEWS file with this change.

vincentarelbundock commented 1 month ago

Unfortunately, the coefs[-nvar0] line does not work with all models. For example, this tobit model inherits from survreg:

library(AER)
library(marginaleffects)
dat <- read.csv("https://vincentarelbundock.github.io/Rdatasets/csv/AER/Affairs.csv")
mod1 <- tobit(
    affairs ~ age + yearsmarried + religiousness + occupation + rating,
    data = dat)
avg_slopes(mod1, newdata = dat)
Warning in model[["coefficients"]][] <- coefs[-nvar0]: number of items to
replace is not a multiple of replacement length
Warning in model[["coefficients"]][] <- coefs[-nvar0]: number of items to
replace is not a multiple of replacement length
Warning in model[["coefficients"]][] <- coefs[-nvar0]: number of items to
replace is not a multiple of replacement length
Warning in model[["coefficients"]][] <- coefs[-nvar0]: number of items to
replace is not a multiple of replacement length
Warning in model[["coefficients"]][] <- coefs[-nvar0]: number of items to
replace is not a multiple of replacement length
Warning in model[["coefficients"]][] <- coefs[-nvar0]: number of items to
replace is not a multiple of replacement length
Warning in model[["coefficients"]][] <- coefs[-nvar0]: number of items to
replace is not a multiple of replacement length

          Term    Contrast Estimate Std. Error      z Pr(>|z|)    S  2.5 %
 age           mean(dY/dX)   -0.179     0.0791 -2.267   0.0234  5.4 -0.334
 occupation    mean(dY/dX)    0.326     0.2544  1.282   0.2000  2.3 -0.173
 rating        mean(dY/dX)   -2.285     2.7414 -0.833   0.4046  1.3 -7.658
 religiousness mean(dY/dX)   -1.686     0.4038 -4.176   <0.001 15.0 -2.478
 yearsmarried  mean(dY/dX)    0.554     0.1345  4.120   <0.001 14.7  0.290

Columns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted 
Type:  response 
ngreifer commented 1 month ago

I see. I think then tobit should get its own method that defaults back to the default method, since that is basically what insight::get_parameters() does, or you can add an exception to use the else statement when the model inherits from "tobit".

vincentarelbundock commented 1 month ago

it would be helpful to see an example of the breakage.For example, this uses a different dist and still works fine:

library(survival)
fit <- survreg(Surv(futime, fustat) ~ ecog.ps + rx, ovarian, dist='weibull',  scale=1)
marginaleffects::avg_slopes(fit)

In many get_predict() method, we replace parameters by name, rather than positions, as this is usually safer. With an example, I could see if that's possible.

ngreifer commented 1 month ago

If you don't set the scale, it breaks. Check out survival:::summary.survreg(), which essentially uses position-based matching to extract the coefficients. This is called by get_parameters(). The issue is this:

Currently, get_coef() extracts the location and scale coefficients because it calls get_parameters(), but set_coef() assigns all extracted coefficients to fit$coefficients. That means if the scale coefficient is included, an extra coefficient is set by set_coef(), which means the predictions will fail.

Note that the above does not occur with tobit models, which have their own get_parameters() method that I believe only extracts the location coefficients.

vincentarelbundock commented 1 month ago

Thanks!