zachmayer / caretEnsemble

caret models all the way down :turtle:
Other
226 stars 75 forks source link

Ensemble model training fails becuase x$control$savePredictions is not TRUE #178

Closed JasonCEC closed 8 years ago

JasonCEC commented 8 years ago

The most recent development version of caretEnsemble has incorrect assumptions for the return value of x$control$savePredictions causing ensemble training to fail.

A reproducible example is below, and includes a workaround:

## Reproducable bug for control$savePredictions is not TRUE

exampleTrain <- iris[iris$Species != "setosa", ]
exampleTrain$Species <- droplevels(exampleTrain$Species)

exampleModel <- caretList(Species ~ ., 
                         data = exampleTrain,
                         trControl= trainControl(
                           method='repeatedcv',
                           number = 10,
                           ## repeated five times
                           repeats = 5,
                           classProbs=TRUE,
                           savePredictions=TRUE,
                           index=createFolds(exampleTrain$Species, 2),
                           summaryFunction=twoClassSummary
                         ),
                         metric='ROC',
                         tuneList=list(
                           rf1=caretModelSpec(method='rf', tuneLength=3),
                           nn1=caretModelSpec(method='nnet', tuneLength=3, trace=FALSE)
                         )
)

## This is the failing function
exampleEnsemble <-  caretEnsemble(exampleModel)
# it returns: Error: x$control$savePredictions is not TRUE 

## This can be fixed by: 
exampleModel$rf1$control$savePredictions <- TRUE
exampleModel$nn1$control$savePredictions <- TRUE
# This is an annoying fix when you have ~40 models in the initial ensemble 

## The model now works
exampleEnsemble <-  caretEnsemble(exampleModel)

print(exampleEnsemble)
JasonCEC commented 8 years ago

This may have been caused by an update to caret: https://github.com/topepo/caret/blob/c4a76780810875d9b957ee2cc0f17db6f0c1786c/pkg/caret/R/train.default.R#L164

The check in caretEnsable is here: https://github.com/zachmayer/caretEnsemble/blob/308fa6e672f6bf2c3654d92f30ec409f035ae889/R/helper_functions.R#L170

I'd be happy to issue the pull request if you could outline what the appropriate fix for this is?

zachmayer commented 8 years ago

Another check also occurs here. https://github.com/zachmayer/caretEnsemble/blob/308fa6e672f6bf2c3654d92f30ec409f035ae889/R/helper_functions.R#L53

In general, caretEnsemble needs some work, but I haven't had time recently. I'll try to do an update soon to fix this and other issues.

I also think this change to caret was at my request— I'd asked to be able to save just the predictions for the best model, which really cuts down on the size of models like glmnet and gbm that can have hundreds of sub models.

JasonCEC commented 8 years ago

Would you like me to issue a pull request just removing those lines, or checking that it's not "none" (the new alternative value instead of FALSE)?

zachmayer commented 8 years ago

If you want to submit a pull request checking for "not none", I would appreciate it!

Sent from my iPhone

On Dec 19, 2015, at 6:31 PM, Jason Cohen notifications@github.com wrote:

Would you like me to issue a pull request just removing those lines, or checking that it's not "none" (the new alternative value instead of FALSE)?

— Reply to this email directly or view it on GitHub.

JasonCEC commented 8 years ago

Working on the pull request - I don't understand what needs to be changed in the location you linked (R/helper_functions.R#53);

all of the observations are still there, I believe?

Can I issue the pull request fixing only line 170 in helper _functions?

JasonCEC commented 8 years ago

If I don't need to change anything on line 53, the above pull request should close this issue.

terrytangyuan commented 8 years ago

@JasonCEC You might want to submit a PR in zachmayer/caretEnsemble instead of in your forked copy?

Mosquito00 commented 8 years ago

@JasonCEC: Is there also another possiblity for Ensemble model training in R apart from caretEnsemble ?

Thank you.

JasonCEC commented 8 years ago

@Mosquito00 You could build the ensemble yourself.... but caretEnsemble does much of the hard work for you, even if its a bit behind caret at the moment.

JasonCEC commented 8 years ago

I believe this has been closed on the current branch with commit "fix for savePredictions".