Open zee86 opened 7 years ago
Observed values for each component model are not the same. Please re-train the models with the same Y variable
caretEnsemble currently requires all the models have the same target variable. In this case, it looks like you end up with different target variables because you are using different imputations.
hi, Basically in the following code I created five imputed datasets, then applied SVM to each imputed dataset using the train function in caret, then ensemble the resulted training model using caretEnsemble. to be able at the end to predict each test set using the ensemble model. however, I have the following error: Error in check_bestpreds_obs(modelLibrary) : Observed values for each component model are not the same. Please re-train the models with the same Y variable
Is there any way that can enable caretEnsembel to accept different trained model or if there any R package out there you may know can allow me to do the ensemble of those different trained models ?
I appreciate any help. Thank you.
library(mice) library(e1071) library(caret) library("caretEnsemble")
data <- iris
Generate 10% missing values at Random
iris.mis <- prodNA(iris, noNA = 0.1)
remove categorical variables
iris.mis <- subset(iris.mis, select = -c(Species))
5 Imputation using mice pmm
imp <- mice(iris.mis, m=5, maxit = 10, method = 'pmm', seed = 500)
save 5 imputed dataset.
x1 <- complete(imp, action = 1, include = FALSE) x2 <- complete(imp, action = 2, include = FALSE) x3 <- complete(imp, action = 3, include = FALSE) x4 <- complete(imp, action = 4, include = FALSE) x5 <- complete(imp, action = 5, include = FALSE)
Apply the following method for each imputed set
form <- iris$Sepal.Width # target coloumn n <- nrow(x1) # since all data sample are the same length prop <- n%/%fold set.seed(7) newseq <- rank(runif(n)) k <- as.factor((newseq - 1)%/%prop + 1) i<-1 CVfolds <- 10 CVrepeats <- 3 indexPreds <- createMultiFolds(x1[k != i,]$Sepal.Width, CVfolds, CVrepeats) ctrl <- trainControl(method = "repeatedcv", repeats = CVrepeats,number = CVfolds, returnResamp = "all", savePredictions = "all", index = indexPreds)
fit1 <- train(Sepal.Width ~., data = x1[k !=i, ],method='svmLinear2',trControl = ctrl) fit2 <- train(Sepal.Width ~., data = x2[k != i, ],method='svmLinear2',trControl = ctrl) fit3 <- train(Sepal.Width ~., data = x3[k != i, ],method='svmLinear2',trControl = ctrl) fit4 <- train(Sepal.Width ~., data = x4[k != i, ],method='svmLinear2',trControl = ctrl) fit5 <- train(Sepal.Width ~., data = x5[k != i, ],method='svmLinear2',trControl = ctrl)
combine the created model to a list
svm.fit <- list( fit1, fit2, fit3, fit4, fit5)
convert the list to cartlist
class(svm.fit) <- "caretList"
create the ensemble where the error occur.
svm.all <- caretEnsemble(svm.fit,method='svmLinear2')
predict test set using the ensembel
fcast1 <- predict(svm.all, newdata = x1[k == i, ])