zachmayer / caretEnsemble

caret models all the way down :turtle:
Other
226 stars 75 forks source link

caretEnsemble fails with an error about type of predictions #210

Closed karruo closed 7 years ago

karruo commented 7 years ago

I am fitting an ensemble of models in a regression setting and have noticed the same issue with both caret Ensemble and caretStack. Some methods (model types) seem to lead to an error that states that the predictions are not of the same type but mixes numbers and characters. This is strange since when predicting directly with each of the fitted individual models all predictions are numeric. Below is a reproducible example. The exact error message is:

Error in check_bestpreds_preds(modelLibrary) : Component models do not all have the same type of predicitons. Predictions are a mix of numeric, character.

Here is the example:

example based on cpus data in MASS and follows Venables & Ripley

MASS 3rd ed, page 311

require(MASS) data(cpus) y<-log10(cpus$perf) X<-cpus[,2:8]

require(caret) require(caretEnsemble)

fitCtrl <- trainControl(method = "repeatedcv", number=10, repeats=10, savePredictions="final")

models<-caretList(x=X,y=y,preProc=c("center","scale"), tuneList=list( lm=caretModelSpec(method="lm"), pls=caretModelSpec(method="pls",tuneLength=30), ridge=caretModelSpec(method="ridge", tuneGrid=expand.grid(.lambda=seq(0,.1,len=15))), enet=caretModelSpec(method="enet", tuneGrid=expand.grid(.lambda=c(0,.01,1), .fraction=seq(.05,1,len=20))), nnet=caretModelSpec(method="pcaNNet", tuneGrid=expand.grid(.decay=c(0,.01,.1), .size=1:10)), svm=caretModelSpec(method="svmRadial",tuneLength=10), mars=caretModelSpec(method="earth", tuneGrid=expand.grid(.degree=1:2, .nprune=2:38)), kknn=caretModelSpec(method="kknn", tuneGrid=expand.grid(.kmax=1:20, .distance=1:10, .kernel="optimal")), rpart=caretModelSpec(method="rpart2",tuneLength=10), som=caretModelSpec(method="xyf", tuneGrid=expand.grid(.xdim=seq(5,45,10), .ydim=seq(5,45,10), .xweight=c(.5,1,2), .topo="hexagonal")), gbm=caretModelSpec(method="gbm", tuneGrid=expand.grid(.interaction.depth=seq(1,7,2), .n.trees=seq(100,1000,50), .shrinkage=c(.01,.1), .n.minobsinnode=c(5,10))), treebag=caretModelSpec(method="treebag"), rf=caretModelSpec(method="rf", tuneLength=10,ntrees=1000,importance=TRUE), cubist=caretModelSpec(method="cubist", tuneGrid=expand.grid(.committees=c(10,50,100,200,300,500), .neighbors=c(0,1,3,5,7,9)))), trControl=fitCtrl)

results<-resamples(models) summary(results)

ensb<-caretEnsemble(models) ## gives an error about type of predictions:

Error in check_bestpreds_preds(modelLibrary) :

Component models do not all have the same type of predicitons.

Predictions are a mix of numeric, character.

washcycle commented 7 years ago

I narrowed this down to the xyf model if you remove that this code will work.

It appears that xyf saves it's numeric predictions as character vectors. This might be a bug in caret itself.

Another interesting fact is that if you remove the tuneGrid parameter. It will work.

jeonghyunwoo commented 7 years ago

when modelList includes 'gam','gamLoess','gamSpline', same error message occurs also.

zachmayer commented 7 years ago

This is because the models failed. You can't run gam and gamLoess/gamSpline in the same caretList (or even the same R session).

Basically, both packages are a little stupid, and the namespaces collide, which causes both packages to fail. If you want to do gams, you have to pick one or the other.

zachmayer commented 7 years ago

Yeah, a lot of apparent problems with caret / caretEnsemble are actually the underlying models failing in cryptic ways. I usually stick to the tried and true models, like gbm, ranger, and glmnet. Between those 3 you often don't need anything else.

jeonghyunwoo commented 7 years ago

Thank you very much for quick and kind answers.^^

      1. 오후 10:31에 "Zach Mayer" notifications@github.com님이 작성:

Closed #210 https://github.com/zachmayer/caretEnsemble/issues/210.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/zachmayer/caretEnsemble/issues/210#event-830589980, or mute the thread https://github.com/notifications/unsubscribe-auth/AQDKZS32Uuy9DBw5YhVkYJolt_R59K2gks5q120bgaJpZM4JjBI3 .

zachmayer commented 7 years ago

No problem. Thanks for using my package!

karruo commented 7 years ago

Hi, When I faced this error and examined a bit I noticed it could relate to the format predictions are returned from the respected functions: one column matrix vs vector, for example. I have not had time to dig in deeper since but perhaps worth checking if that is the case. If yes, it is possibly easy to ensure a correct format?

Regards, Kari NB. Thanks for a wonderfully useful package!

Zach Mayer notifications@github.com kirjoitti 20.10.2016 kello 16.38:

No problem. Thanks for using my package!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

zachmayer commented 7 years ago

Interesting. That sounds like maybe a bug in caret— caret should always be returning a consistent format (but it's hard, because the underlying packages can change).

JeffHebert commented 6 years ago

I've found that some models (like extraTrees) return predictions as a character vector. I don't know why this happens. I was able to 'correct' the Error in check_bestpreds_preds(modelLibrary) by using code like this,

modelLibrary$extraTrees$pred$pred <- as.numeric(modelLibrary$extraTrees$pred$pred)

After modifying the predictions, caretEnsemble works as expected.

P.S. I really like caretEnsemble!!!

zachmayer commented 6 years ago

If you want to submit a PR with a unit test + a fix, please do!

washcycle commented 6 years ago

I wonder if this is a fix that should be applied to caret itself.

On Wed, Jul 5, 2017, 15:03 Zach Mayer notifications@github.com wrote:

If you want to submit a PR with a unit test + a fix, please do!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/zachmayer/caretEnsemble/issues/210#issuecomment-313211048, or mute the thread https://github.com/notifications/unsubscribe-auth/ABBp_hcr6iiJD0wGbB7SBBMLuxXb3CuQks5sK-wHgaJpZM4JjBI3 .

ran88dom99 commented 6 years ago

gbm crashes for me. following code nneds to be modified to test all models

#### THIS SCRIPT IS FOR TESTING MODELS. WHICH CAUSE caretEnsemble TO FAIL?
#devtools::install_github("zachmayer/caretEnsemble")
#update.packages(oldPkgs="caret", ask=FALSE)
sessionInfo()
adaptControl <-trainControl(method = "cv", number = 10,  search = "random")
library("caTools")
library("caretEnsemble")#"extraTrees","gbm",
stackmodels<-c( "rpart","glm","cubist","earth","bagEarth",
              "lasso","Rborist","rlm","nnet","pcaNNet","avNNet","pcr","ppr",
"enet", "blassoAveraged",   "leapBackward","BstLm","gamboost","xgbTree","svmLinear2")
#
for (o in stackmodels) {
  for (p in stackmodels) {
    twomodels<-vector(mode = "character",length = 2)
    twomodels[1]<-o;twomodels[2]<-p;
    print(twomodels)
    writeout<- paste(c(twomodels),sep = ",")
    for(i in 2:length(writeout)){
      writeout[1]<-paste(writeout[1],writeout[i],sep=",")}
if(o==p) next()

    try({
set.seed(222)
model_list <- caretList(
  x=training[,-1],
  y=training[,1],
  trControl=adaptControl,
  methodList=twomodels
)
z <- as.data.frame(predict(model_list, newdata=head(testing[,-1])))
print(z)
#xyplot(resamples(model_list))
#modelCor(resamples(model_list))
    })
    failed<-1
try({
greedy_ensemble <- caretEnsemble(
  model_list
  )
summary(greedy_ensemble)

model_preds <- lapply(model_list, predict, newdata=testing[,-1], type="raw")
#model_preds <- lapply(model_preds, function(x) x[,"M"])
model_preds <- data.frame(model_preds)
ens_preds <- predict(greedy_ensemble, newdata=testing[,-1], type="raw")
model_preds$ensemble <- ens_preds
model_preds

varImp(greedy_ensemble)

overRMSE<-(-1)#greedy_ensemble$error$RMSE
allmodel<-"caretEnsGreedyGlm"
printPredMets(predicted.outcomes=ens_preds,overRMSE=overRMSE,hypercount="full")
failed<-0
})
if(failed==1) write.table(paste(writeout[1],"greedy",sep = ","),file = "carensfails.csv",  quote = F, sep = ",", row.names = F,col.names = F,append = T)

#caTools::colAUC(model_preds, testing$Class)

for (i in twomodels) {
  failed<-1
try({
stack_ensemble <- caretStack(
  model_list,
  method=i, 
  tuneLength=tuneLength,
  trControl=adaptControl
)
#$ens_model$finalModel
ens_preds <- predict(stack_ensemble, newdata=testing[,-1], type="raw")
overRMSE<-(-1)#min(stack_ensemble$error$RMSE, na.rm = T)
allmodel<-paste("caretEnstk",i,sep = " ")
printPredMets(predicted.outcomes=ens_preds,overRMSE=overRMSE,hypercount="full")
failed<-0
})
}
if(failed==1) write.table(paste(writeout[1],sep = ","),file = "carensfails.csv",  quote = F, sep = ",", row.names = F,col.names = F,append = T)
  }
}