Closed karruo closed 7 years ago
I narrowed this down to the xyf
model if you remove that this code will work.
It appears that xyf
saves it's numeric predictions as character vectors. This might be a bug in caret
itself.
Another interesting fact is that if you remove the tuneGrid parameter. It will work.
when modelList includes 'gam','gamLoess','gamSpline', same error message occurs also.
This is because the models failed. You can't run gam and gamLoess/gamSpline in the same caretList (or even the same R session).
Basically, both packages are a little stupid, and the namespaces collide, which causes both packages to fail. If you want to do gams, you have to pick one or the other.
Yeah, a lot of apparent problems with caret / caretEnsemble are actually the underlying models failing in cryptic ways. I usually stick to the tried and true models, like gbm, ranger, and glmnet. Between those 3 you often don't need anything else.
Thank you very much for quick and kind answers.^^
Closed #210 https://github.com/zachmayer/caretEnsemble/issues/210.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/zachmayer/caretEnsemble/issues/210#event-830589980, or mute the thread https://github.com/notifications/unsubscribe-auth/AQDKZS32Uuy9DBw5YhVkYJolt_R59K2gks5q120bgaJpZM4JjBI3 .
No problem. Thanks for using my package!
Hi, When I faced this error and examined a bit I noticed it could relate to the format predictions are returned from the respected functions: one column matrix vs vector, for example. I have not had time to dig in deeper since but perhaps worth checking if that is the case. If yes, it is possibly easy to ensure a correct format?
Regards, Kari NB. Thanks for a wonderfully useful package!
Zach Mayer notifications@github.com kirjoitti 20.10.2016 kello 16.38:
No problem. Thanks for using my package!
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.
Interesting. That sounds like maybe a bug in caret— caret should always be returning a consistent format (but it's hard, because the underlying packages can change).
I've found that some models (like extraTrees) return predictions as a character vector. I don't know why this happens. I was able to 'correct' the Error in check_bestpreds_preds(modelLibrary) by using code like this,
modelLibrary$extraTrees$pred$pred <- as.numeric(modelLibrary$extraTrees$pred$pred)
After modifying the predictions, caretEnsemble works as expected.
P.S. I really like caretEnsemble!!!
If you want to submit a PR with a unit test + a fix, please do!
I wonder if this is a fix that should be applied to caret itself.
On Wed, Jul 5, 2017, 15:03 Zach Mayer notifications@github.com wrote:
If you want to submit a PR with a unit test + a fix, please do!
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/zachmayer/caretEnsemble/issues/210#issuecomment-313211048, or mute the thread https://github.com/notifications/unsubscribe-auth/ABBp_hcr6iiJD0wGbB7SBBMLuxXb3CuQks5sK-wHgaJpZM4JjBI3 .
gbm crashes for me. following code nneds to be modified to test all models
#### THIS SCRIPT IS FOR TESTING MODELS. WHICH CAUSE caretEnsemble TO FAIL?
#devtools::install_github("zachmayer/caretEnsemble")
#update.packages(oldPkgs="caret", ask=FALSE)
sessionInfo()
adaptControl <-trainControl(method = "cv", number = 10, search = "random")
library("caTools")
library("caretEnsemble")#"extraTrees","gbm",
stackmodels<-c( "rpart","glm","cubist","earth","bagEarth",
"lasso","Rborist","rlm","nnet","pcaNNet","avNNet","pcr","ppr",
"enet", "blassoAveraged", "leapBackward","BstLm","gamboost","xgbTree","svmLinear2")
#
for (o in stackmodels) {
for (p in stackmodels) {
twomodels<-vector(mode = "character",length = 2)
twomodels[1]<-o;twomodels[2]<-p;
print(twomodels)
writeout<- paste(c(twomodels),sep = ",")
for(i in 2:length(writeout)){
writeout[1]<-paste(writeout[1],writeout[i],sep=",")}
if(o==p) next()
try({
set.seed(222)
model_list <- caretList(
x=training[,-1],
y=training[,1],
trControl=adaptControl,
methodList=twomodels
)
z <- as.data.frame(predict(model_list, newdata=head(testing[,-1])))
print(z)
#xyplot(resamples(model_list))
#modelCor(resamples(model_list))
})
failed<-1
try({
greedy_ensemble <- caretEnsemble(
model_list
)
summary(greedy_ensemble)
model_preds <- lapply(model_list, predict, newdata=testing[,-1], type="raw")
#model_preds <- lapply(model_preds, function(x) x[,"M"])
model_preds <- data.frame(model_preds)
ens_preds <- predict(greedy_ensemble, newdata=testing[,-1], type="raw")
model_preds$ensemble <- ens_preds
model_preds
varImp(greedy_ensemble)
overRMSE<-(-1)#greedy_ensemble$error$RMSE
allmodel<-"caretEnsGreedyGlm"
printPredMets(predicted.outcomes=ens_preds,overRMSE=overRMSE,hypercount="full")
failed<-0
})
if(failed==1) write.table(paste(writeout[1],"greedy",sep = ","),file = "carensfails.csv", quote = F, sep = ",", row.names = F,col.names = F,append = T)
#caTools::colAUC(model_preds, testing$Class)
for (i in twomodels) {
failed<-1
try({
stack_ensemble <- caretStack(
model_list,
method=i,
tuneLength=tuneLength,
trControl=adaptControl
)
#$ens_model$finalModel
ens_preds <- predict(stack_ensemble, newdata=testing[,-1], type="raw")
overRMSE<-(-1)#min(stack_ensemble$error$RMSE, na.rm = T)
allmodel<-paste("caretEnstk",i,sep = " ")
printPredMets(predicted.outcomes=ens_preds,overRMSE=overRMSE,hypercount="full")
failed<-0
})
}
if(failed==1) write.table(paste(writeout[1],sep = ","),file = "carensfails.csv", quote = F, sep = ",", row.names = F,col.names = F,append = T)
}
}
I am fitting an ensemble of models in a regression setting and have noticed the same issue with both caret Ensemble and caretStack. Some methods (model types) seem to lead to an error that states that the predictions are not of the same type but mixes numbers and characters. This is strange since when predicting directly with each of the fitted individual models all predictions are numeric. Below is a reproducible example. The exact error message is:
Error in check_bestpreds_preds(modelLibrary) : Component models do not all have the same type of predicitons. Predictions are a mix of numeric, character.
Here is the example:
example based on cpus data in MASS and follows Venables & Ripley
MASS 3rd ed, page 311
require(MASS) data(cpus) y<-log10(cpus$perf) X<-cpus[,2:8]
require(caret) require(caretEnsemble)
fitCtrl <- trainControl(method = "repeatedcv", number=10, repeats=10, savePredictions="final")
models<-caretList(x=X,y=y,preProc=c("center","scale"), tuneList=list( lm=caretModelSpec(method="lm"), pls=caretModelSpec(method="pls",tuneLength=30), ridge=caretModelSpec(method="ridge", tuneGrid=expand.grid(.lambda=seq(0,.1,len=15))), enet=caretModelSpec(method="enet", tuneGrid=expand.grid(.lambda=c(0,.01,1), .fraction=seq(.05,1,len=20))), nnet=caretModelSpec(method="pcaNNet", tuneGrid=expand.grid(.decay=c(0,.01,.1), .size=1:10)), svm=caretModelSpec(method="svmRadial",tuneLength=10), mars=caretModelSpec(method="earth", tuneGrid=expand.grid(.degree=1:2, .nprune=2:38)), kknn=caretModelSpec(method="kknn", tuneGrid=expand.grid(.kmax=1:20, .distance=1:10, .kernel="optimal")), rpart=caretModelSpec(method="rpart2",tuneLength=10), som=caretModelSpec(method="xyf", tuneGrid=expand.grid(.xdim=seq(5,45,10), .ydim=seq(5,45,10), .xweight=c(.5,1,2), .topo="hexagonal")), gbm=caretModelSpec(method="gbm", tuneGrid=expand.grid(.interaction.depth=seq(1,7,2), .n.trees=seq(100,1000,50), .shrinkage=c(.01,.1), .n.minobsinnode=c(5,10))), treebag=caretModelSpec(method="treebag"), rf=caretModelSpec(method="rf", tuneLength=10,ntrees=1000,importance=TRUE), cubist=caretModelSpec(method="cubist", tuneGrid=expand.grid(.committees=c(10,50,100,200,300,500), .neighbors=c(0,1,3,5,7,9)))), trControl=fitCtrl)
results<-resamples(models) summary(results)
ensb<-caretEnsemble(models) ## gives an error about type of predictions:
Error in check_bestpreds_preds(modelLibrary) :
Component models do not all have the same type of predicitons.
Predictions are a mix of numeric, character.