vincentarelbundock / marginaleffects

R package to compute and plot predictions, slopes, marginal means, and comparisons (contrasts, risk ratios, odds, etc.) for over 100 classes of statistical and ML models. Conduct linear and non-linear hypothesis tests, or equivalence tests. Calculate uncertainty estimates using the delta method, bootstrapping, or simulation-based inference
https://marginaleffects.com
Other
433 stars 45 forks source link

Support: gbm #1061

Closed Gaetano-1996 closed 5 months ago

Gaetano-1996 commented 5 months ago

Hello everyone!

I'm trying to use the avg_comparison function of the package in order to implement g-computation for a causal inference problem. In particular, I would like to implement a non-parametric Q-model (gbm).

The gbm model, even if it includes a working predict() method is not supported, I think it might be helpful the inclusion to facilitate the process.

vincentarelbundock commented 5 months ago

What package is that? Can you supply a minimal working example with a small public dataset (ideally built in with R or from.yhe Rdatasets archive)?

Gaetano-1996 commented 5 months ago

The package is the gbm package.

I will provide the working example as soon as possible. In the mean time I report my code:

# ...
gbm.out = gbm(
  formula = formula.out,
  distribution = "bernoulli",
  data = dati,
  weights = dati$psw.nonphys,
  n.trees = 30000,
  interaction.depth = 1,
  n.minobsinnode = 10,
  shrinkage = 0.001,
  bag.fraction = 0.5,
  cv.fold=5
)

best.iter = gbm.perf(gbm.out, method = "cv")

# conterfactual treated
treated =  subset(dati,nonphys == 1)
cont.trt = predict(gbm.out,type = "response",n.trees = best.iter,
                   newdata = treated)
treated$cont.out = cont.trt
E_y1 = mean(cont.trt)

# conterfactual control
control = treated %>% 
  mutate(nonphys = 0)

cont.ctrl = predict(gbm.out,type = "response",n.trees = best.iter,
                    newdata = control)
control$cont.out = cont.ctrl
E_y0 = mean(cont.ctrl)

(np.ATT = E_y1 - E_y0)

As you may see here I'm trying to estimate ATT selecting only the treated subsample. In this example I did not use the marginal structural model but the result is the same.

with generalized linear model as Q-model I was able to obtain the same procedure using avg_comparison(). In this case calling the function generate this error:

> avg_comparisons(gbm.out,
+                 variables = "nonphys",
+                 newdata = treated,
+                 wts = "psw.nonphys")
Error: Models of class "gbm" are not supported. Supported model classes include:

  afex_aov, amest, bart, betareg, bglmerMod, bigglm, biglm, blmerMod, bracl, brglmFit, brmsfit,
  brnb, clm, clmm2, clogit, coxph, crch, fixest, flac, flic, gam, Gam, gamlss, geeglm, glimML,
  glm, glmerMod, glmmPQL, glmmTMB, glmrob, glmx, gls, Gls, hetprob, hurdle, hxlr, iv_robust,
  ivpml, ivreg, Learner, lm, lm_robust, lme, lmerMod, lmerModLmerTest, lmrob, lmRob, loess,
  logistf, lrm, mblogit, mclogit, MCMCglmm, mhurdle, mira, mlogit, model_fit, multinom, mvgam,
  negbin, nls, ols, oohbchoice, orm, phyloglm, phylolm, plm, polr, Rchoice, rlmerMod, rq, scam,
  selection, speedglm, speedlm, stanreg, survreg, svyolr, tobit, tobit1, truncreg, workflow,
  zeroinfl

  New modeling packages can usually be supported by `marginaleffects` if they include a working
  `predict()` method. If you believe that this is the case, please file a feature request on
  Github: https://github.com/vincentarelbundock/margi

Now due to the potential bias inducted by the outcome model misspacification it is my believe that including non-parametric model would benefit the library.

vincentarelbundock commented 5 months ago

Looks like this model is supported by the mlr3 framework.

https://mlr3extralearners.mlr-org.com/reference/mlr_learners_regr.gbm.html

This means that marginaleffects can already operate on this model naturally:

https://marginaleffects.com/vignettes/machine_learning.html#mlr3