zachmayer / caretEnsemble

caret models all the way down :turtle:
Other
226 stars 75 forks source link

issue : CaretEnsemble with different trained models #223

Open zee86 opened 7 years ago

zee86 commented 7 years ago

hi, Basically in the following code I created five imputed datasets, then applied SVM to each imputed dataset using the train function in caret, then ensemble the resulted training model using caretEnsemble. to be able at the end to predict each test set using the ensemble model. however, I have the following error: Error in check_bestpreds_obs(modelLibrary) : Observed values for each component model are not the same. Please re-train the models with the same Y variable

Is there any way that can enable caretEnsembel to accept different trained model or if there any R package out there you may know can allow me to do the ensemble of those different trained models ?

I appreciate any help. Thank you.

library(mice) library(e1071) library(caret) library("caretEnsemble")

data <- iris

Generate 10% missing values at Random

iris.mis <- prodNA(iris, noNA = 0.1)

remove categorical variables

iris.mis <- subset(iris.mis, select = -c(Species))

5 Imputation using mice pmm

imp <- mice(iris.mis, m=5, maxit = 10, method = 'pmm', seed = 500)

save 5 imputed dataset.

x1 <- complete(imp, action = 1, include = FALSE) x2 <- complete(imp, action = 2, include = FALSE) x3 <- complete(imp, action = 3, include = FALSE) x4 <- complete(imp, action = 4, include = FALSE) x5 <- complete(imp, action = 5, include = FALSE)

Apply the following method for each imputed set

form <- iris$Sepal.Width # target coloumn n <- nrow(x1) # since all data sample are the same length prop <- n%/%fold set.seed(7) newseq <- rank(runif(n)) k <- as.factor((newseq - 1)%/%prop + 1) i<-1 CVfolds <- 10 CVrepeats <- 3 indexPreds <- createMultiFolds(x1[k != i,]$Sepal.Width, CVfolds, CVrepeats) ctrl <- trainControl(method = "repeatedcv", repeats = CVrepeats,number = CVfolds, returnResamp = "all", savePredictions = "all", index = indexPreds)

fit1 <- train(Sepal.Width ~., data = x1[k !=i, ],method='svmLinear2',trControl = ctrl) fit2 <- train(Sepal.Width ~., data = x2[k != i, ],method='svmLinear2',trControl = ctrl) fit3 <- train(Sepal.Width ~., data = x3[k != i, ],method='svmLinear2',trControl = ctrl) fit4 <- train(Sepal.Width ~., data = x4[k != i, ],method='svmLinear2',trControl = ctrl) fit5 <- train(Sepal.Width ~., data = x5[k != i, ],method='svmLinear2',trControl = ctrl)

combine the created model to a list

svm.fit <- list( fit1, fit2, fit3, fit4, fit5)

convert the list to cartlist

class(svm.fit) <- "caretList"

create the ensemble where the error occur.

svm.all <- caretEnsemble(svm.fit,method='svmLinear2')

predict test set using the ensembel

fcast1 <- predict(svm.all, newdata = x1[k == i, ])

zachmayer commented 7 years ago

Observed values for each component model are not the same. Please re-train the models with the same Y variable

caretEnsemble currently requires all the models have the same target variable. In this case, it looks like you end up with different target variables because you are using different imputations.