Support: `SuperLearner`

vincentarelbundock commented 1 year ago

Hello. As I wrote on Twitter, I'd like to add support for the SuperLearner R package, available on CRAN. I am looking at the examples provided and one thing that differentiates it from the other supported classes is that the result of a call to the SuperLearner functions is essentially a list of fitted models, and not just one. It currently supports the following estimators:

> listWrappers()
All prediction algorithm wrappers in SuperLearner:

 [1] "SL.bartMachine"      "SL.bayesglm"         "SL.biglasso"        
 [4] "SL.caret"            "SL.caret.rpart"      "SL.cforest"         
 [7] "SL.earth"            "SL.extraTrees"       "SL.gam"             
[10] "SL.gbm"              "SL.glm"              "SL.glm.interaction" 
[13] "SL.glmnet"           "SL.ipredbagg"        "SL.kernelKnn"       
[16] "SL.knn"              "SL.ksvm"             "SL.lda"             
[19] "SL.leekasso"         "SL.lm"               "SL.loess"           
[22] "SL.logreg"           "SL.mean"             "SL.nnet"            
[25] "SL.nnls"             "SL.polymars"         "SL.qda"             
[28] "SL.randomForest"     "SL.ranger"           "SL.ridge"           
[31] "SL.rpart"            "SL.rpartPrune"       "SL.speedglm"        
[34] "SL.speedlm"          "SL.step"             "SL.step.forward"    
[37] "SL.step.interaction" "SL.stepAIC"          "SL.svm"             
[40] "SL.template"         "SL.xgboost"

It is therefore necessary to take care of e.g., extracting the coefficients from each estimator and somehow combine them (e.g., by averaging them). One question is: taking as example the get_coef function, should I manually extract the coefficients from the individual estimators of the library?

Originally posted by @lorenzoFabbri in https://github.com/vincentarelbundock/marginaleffects/issues/49#issuecomment-1624928196

vincentarelbundock commented 1 year ago

The main purpose of get_coef() is to extract coefficients that will then be (slightly) modified and fed back to set_coef(). The latter function is then used to modify the model and make (slightly) different predictions to get the derivatives of predictions/contrasts/slopes with respect to the parameters of the model. This is what ultimately gives us the Delta Method standard errors.

I don't know super learner, but my guess is that this doesn't make much sense as a procedure in a multi-model "ensemble" setup like this. You probably won't be able to get standard errors in the classical statistics way, and may have to rely on bootstrapping or somesuch.

If we are giving up on the delta method, we may not even need get_coef() at all.

Please note that it is summer vacation season, so I may not be able to answer messages promptly or offer as much support as I normally would.

vincentarelbundock commented 1 year ago

I am still very interested in this, but I am closing the issue to keep the repository "clean". I listed SuperLearner as a desirable future support in the issue where I consolidate all these requests: https://github.com/vincentarelbundock/marginaleffects/issues/49

Hopefully someone will rise up to the occasion, or I will find some time in the future to look into it.

lorenzoFabbri commented 1 year ago

I am also very interested but I will only slowly try to find a proper solution. I need to wrap up a paper now.

vincentarelbundock commented 1 month ago

I believe this might be supported via tidymodels already: https://www.alexpghayes.com/post/2019-04-13_implementing-the-superlearner-with-tidymodels/

lorenzoFabbri commented 1 month ago

Looks like so, yes!

vincentarelbundock / marginaleffects

Support: `SuperLearner` #834