Closed brent-halen closed 1 month ago
The problem is that your models all use different re-sampling folds, because you do not explicitly define them in your trainControl.
Please use the caretList
helper function:
library(caret)
col <- c(rnorm(10)*2000)
Data <- data.frame(
X = sample(1:10),
Y = sample(c("yes", "no"), 10, replace = TRUE)
)
Data <- cbind(Data,col)
colnames(Data)[3] <- "loss"
dmy <- dummyVars(loss~ ., data = Data)
Data.1 <- predict(dmy, newdata=Data)
Data.1.df <- as.data.frame(Data.1)
Data <- Data.1.df
Data <- cbind(Data,col)
colnames(Data)[4] <- "loss"
library(elasticnet)
library(pls)
library(nnet)
library(e1071)
library(randomForest)
library(gbm)
library(plyr)
library(MASS)
library(caretEnsemble)
control <- trainControl(method="repeatedcv", number=10, repeats=3, verboseIter=TRUE)
models <- caretList(loss~., data = as.data.frame(Data), methodList = c('glm', 'svmRadial', 'svmPoly', 'elm', 'nnet', 'rf'), trControl=control)
caretStack(models, method = "rf")
caretStack(models, method = "gbm", tuneGrid=expand.grid(n.minobsinnode=1, n.trees=10, interaction.depth=1, shrinkage=0.1))
If you must create the models one at a time, you MUST specify an explicitly index to the trainControl:
library(caret)
col <- c(rnorm(10)*2000)
Data <- data.frame(
X = sample(1:10),
Y = sample(c("yes", "no"), 10, replace = TRUE)
)
Data <- cbind(Data,col)
colnames(Data)[3] <- "loss"
dmy <- dummyVars(loss~ ., data = Data)
Data.1 <- predict(dmy, newdata=Data)
Data.1.df <- as.data.frame(Data.1)
Data <- Data.1.df
Data <- cbind(Data,col)
colnames(Data)[4] <- "loss"
library(elasticnet)
library(pls)
library(nnet)
library(e1071)
library(randomForest)
library(gbm)
library(plyr)
library(MASS)
library(caretEnsemble)
index <- createMultiFolds(Data[['loss']], 10, 3)
control <- trainControl(method="repeatedcv", number=10, repeats=3, verboseIter=TRUE, index=index, savePredictions=TRUE, classProbs=TRUE)
model1 <- train(loss~., data = as.data.frame(Data), method='glm', trControl=control)
model2 <- train(loss~., data = as.data.frame(Data), method='svmRadial', trControl=control)
model3 <- train(loss~., data = as.data.frame(Data), method='svmPoly', trControl=control)
model4 <- train(loss~., data = as.data.frame(Data), method='elm', trControl=control)
model5 <- train(loss~., data = as.data.frame(Data), method='nnet', trControl=control)
model6 <- train(loss~., data = as.data.frame(Data), method='rf', trControl=control)
#model7 <- train(loss~., data = as.data.frame(Data), method='lasso', trControl=control) #Always fails
models <- c(model1, model2, model3, model4, model5, model6)
caretStack(models, method = "rf")
caretStack(models, method = "gbm", tuneGrid=expand.grid(n.minobsinnode=1, n.trees=10, interaction.depth=1, shrinkage=0.1))
models <- c(model1,model3)
caretStack(models,method = "rf")
caretStack(models,method = "gbm", tuneGrid=expand.grid(n.minobsinnode=1, n.trees=10, interaction.depth=1, shrinkage=0.1))
In 4.0 we don't require identical indexes, but do require identical training rows.
note that even if the train rows are different, its still useful to make a caretList, because you can use this structure to predict on new data.
Finally, in 4.0 we allow transfer learning in caretStack, so if the models in the caretList are trained with different rows, you can use newdata to ensemble them!
I ran into this issue while working on a different data set/project, but the minimal dataset I detailed below seems to reproduce the issue. However, I do get several warnings running the below code that I don't get when using my other data. If it's necessary, I'll try to construct a more representative facsimile of my data.
When I attempt to use the 'caretStack' function, I'm getting a strange error message:
"Error { .... is not TRUE"
I'm including a screenshot for verification.
https://imgur.com/JbDFqlR
I have no idea how to go about fixing the problem, as I have no idea what is actually broken. I was getting this error in both Windows 10 and Ubuntu 14.04.
Minimal dataset:
Minimal, runnable code:
Session Info:
If there's anything else I need to provide, let me know.