zachmayer / caretEnsemble

caret models all the way down :turtle:
Other
226 stars 75 forks source link

ClassPredictions not saving #254

Closed lsemployeeoftheyear closed 1 year ago

lsemployeeoftheyear commented 1 year ago

Hey all,

Forgive me if this is posted somewhere and I missed it. I've been trying to run the below on my machine and getting the error

Error in check_caretList_model_types(list_of_models) : 
  No predictions saved by train. Please re-run models with trainControl set with savePredictions = TRUE.

If I'm not mistaken I have my trainControl set to save final predictions, so I'm a little unclear regarding what my issue might be. I did check the list object to verify that class predictions are in fact not saving. Code below:

###Minimal runnable code:
library(caretEnsemble)
library(caret)
classifiers = c('ada', 'bayesglm')
set.seed(1)
dat <- caret::twoClassSim(100)
X_train4_imp <- dat[,1:5]
y_train4_stack <- dat[["Class"]]

cl <- makeCluster(4)
registerDoParallel(cl)
paths <- .libPaths()
clusterExport(cl,c('X_train4_imp','y_train4_stack', 'paths'))
clusterEvalQ(cl,expr= {
  .libPaths(paths)
})

my_control <- trainControl(
  method="none"
  , number=4
  # , adaptive = list(
  #    min = 2
  #  , alpha = .05
  #  , method = 'gls'
  #  , complete = TRUE
  # )
  , search = 'random'
  , savePredictions='final'
  , preProcOptions = c('medianImpute', 'zv')
  #na.action = na.pass,
  , classProbs=TRUE
  , index=createResample(y_train4_stack, 4)
  , summaryFunction=twoClassSummary
  , allowParallel = TRUE
  , verboseIter = TRUE
)
set.seed(1)
model_list4 <- caretList(
  x=X_train4_imp,
  y=y_train4_stack,
  trControl=my_control,
  metric="logLoss"
  ,maximize=FALSE
)
set.seed(1)
glm_ensemble_relax <- caretStack(
  model_list4,
  method="glmnet",
  metric="logLoss"
  ,maximize = FALSE
  ,trControl=trainControl(
    method="adaptive_cv"
    , number=4
    , adaptive = list(
       min = 2
       , alpha = .15
       , method = 'gls'
       , complete = TRUE
      )
    , savePredictions="final"
    , preProcOptions = c('medianImpute', 'zv')
    , classProbs=TRUE
    , summaryFunction=twoClassSummary
    , allowParallel = TRUE
    , tuneGrid = expand.grid(.relax=TRUE
                             , .gamma = c(0
                                          ,.5
                                          ,1)
                             , .lambda = seq(1
                                             , 100
                                             , 1)
                             )
  )
)
stopCluster(cl)
zachmayer commented 1 year ago

You caretList call doesn't work— copy/paste your code into a fresh R session

zachmayer commented 1 year ago

You need to use method = "boot" or method = "cv". method = "none" means there's nothing to stack

zachmayer commented 1 year ago

also note that twoClassSummary output ROC (aka AUC) not logloss, so you should use maximize=T

zachmayer commented 1 year ago

Anyways, I cleaned up your example, made it more minimal, and fixed the bugs in your code. It works fine once its fixed:

# Setup
rm(list = ls(all = TRUE))
gc(reset = TRUE)
library(caretEnsemble)
library(caret)
set.seed(1)
dat <- caret::twoClassSim(100)
y  <- dat[["Class"]]

# Shared control object
my_control <- trainControl(
  method = "boot",
  number = 4,
  search = 'random',
  savePredictions ='final',
  preProcOptions = c('medianImpute', 'zv'),
  classProbs = TRUE,
  index = createResample(y, 4),
  summaryFunction = twoClassSummary,
  verboseIter = TRUE
)

# Fit the base models
set.seed(1)
model_list <- caretList(
  x = dat[,1:5],
  y = y,
  trControl = my_control,
  metric = "ROC",
  maximize = TRUE,
  methodList = c('ada', 'bayesglm')
)

# Check that the bae  models have stacked predictions
check = sapply(model_list, function(x){
  nrow(x$pred)>0
})
stopifnot(all(check))
print(check)

# Stack the models
glm_ensemble_relax <- caretStack(
  model_list,
  method="glmnet",
  metric="ROC",
  maximize = TRUE,
  tuneGrid = expand.grid(
    .alpha = c(0, .5 ,1),
    .lambda = seq(1, 100, 1)
  ),
  trControl=trainControl(
    method="adaptive_cv",
    number=4,
    adaptive = list(
      min = 2,
      alpha = .15,
      method = 'gls',
      complete = TRUE
    ),
    savePredictions="final",
    preProcOptions = c('medianImpute', 'zv'),
    classProbs=TRUE,
    summaryFunction=twoClassSummary
  )
)
glm_ensemble_relax
lsemployeeoftheyear commented 1 year ago

Cool, thanks a lot! Will adaptive cv/adaptive boot work then, or does that need to get changed as well?

zachmayer commented 1 year ago

I haven't used adaptive CV or adaptive bootstrap: try it out and see!

zachmayer commented 1 year ago

Be sure to keep your code neat and minimal— there were a few bugs in your snipped above, and as you modify your code to use other methods like adaptive boot, be careful about new bugs creeping in!