Open thomaskern opened 10 years ago
Thank you very much for the reply. I am very sorry if my next question going to sound very naive to you and a waste of time, but I am truly not expert and trying to learn to do it. so is it mean having this error from the following code , is because caretEnsemble does not support multiclass? and is there any R package you may know can allow me to do the ensemble of those different trained models ?
Error in check_bestpreds_obs(modelLibrary) : Observed values for each component model are not the same. Please re-train the models with the same Y variable
library(mice)
library(e1071)
library(caret)
library("caretEnsemble")
data <- iris
#Generate 10% missing values at Random
iris.mis <- prodNA(iris, noNA = 0.1)
#remove categorical variables
iris.mis <- subset(iris.mis, select = -c(Species))
# 5 Imputation using mice pmm
imp <- mice(iris.mis, m=5, maxit = 10, method = 'pmm', seed = 500)
# save 5 imputed dataset.
x1 <- complete(imp, action = 1, include = FALSE)
x2 <- complete(imp, action = 2, include = FALSE)
x3 <- complete(imp, action = 3, include = FALSE)
x4 <- complete(imp, action = 4, include = FALSE)
x5 <- complete(imp, action = 5, include = FALSE)
## Apply the following method for each imputed set
form <- iris$Sepal.Width # target coloumn
n <- nrow(x1) # since all data sample are the same length
prop <- n%/%fold
set.seed(7)
newseq <- rank(runif(n))
k <- as.factor((newseq - 1)%/%prop + 1)
i<-1
CVfolds <- 10
CVrepeats <- 3
indexPreds <- createMultiFolds(x1[k != i,]$Sepal.Width, CVfolds, CVrepeats)
ctrl <- trainControl(method = "repeatedcv", repeats = CVrepeats,number = CVfolds, returnResamp = "all", savePredictions = "all", index = indexPreds)
fit1 <- train(Sepal.Width ~., data = x1[k !=i, ],method='svmLinear2',trControl = ctrl)
fit2 <- train(Sepal.Width ~., data = x2[k != i, ],method='svmLinear2',trControl = ctrl)
fit3 <- train(Sepal.Width ~., data = x3[k != i, ],method='svmLinear2',trControl = ctrl)
fit4 <- train(Sepal.Width ~., data = x4[k != i, ],method='svmLinear2',trControl = ctrl)
fit5 <- train(Sepal.Width ~., data = x5[k != i, ],method='svmLinear2',trControl = ctrl)
#combine the created model to a list
svm.fit <- list( fit1, fit2, fit3, fit4, fit5)
# convert the list to cartlist
class(svm.fit) <- "caretList"
#create the ensemble where the error occur.
svm.all <- caretEnsemble(svm.fit,method='svmLinear2')
Additional note: basically the above code is creating five imputed dataset then apply SVM to each imputed dataset using the train function in caret, then ensemble the resulted training model using caretEnsemble. to be able at the end to predict each test set using the ensemble model.
This has nothing to do with multiclass. Please open a new issue.
I have a problem in using varImp for continuous response variable under caretensemble.I have used nnet,gbm and rf for ensembling.Error is showing like "Error in varImp[, "%IncMSE"] : subscript out of bounds".
is varImp not applicable for continuous response case? while using caretEnsemble I have this warning also "In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, : There were missing values in resampled performance measures."
@shreyaa94 Please post a reproducible example in the caret repo. (varImp
is a caret function).
@shreyaa94 There were missing values in resampled performance measures
It's generally ok if a couple models fail, which is what that warning tells you.
@zachmayer Hey Zach, I am new to caretEnsemble, and I met the almost same problem as above.
library("caret")
library("mlbench")
library("pROC")
data(Sonar)
# I manually add one more category to `Class` varibale
sed.seed(123)
index <- as.integer(runif(n = 60,min = 1,max = 208))
Sonar$Class <- as.character(Sonar$Class)
Sonar$Class[index] <- "Q"
Sonar$Class <- as.factor(Sonar$Class)
# now there are three levels in Class
# [1] Q R Q Q R R R R R R R Q R R R R Q Q R R R R R R R Q Q Q R Q R R R Q R R R R R R Q R R R R ...
# Levels: M Q R
set.seed(107)
inTrain <- createDataPartition(y = Sonar$Class, p = .75, list = FALSE)
training <- Sonar[ inTrain,]
testing <- Sonar[-inTrain,]
my_control <- trainControl(
method="boot",
number=25,
savePredictions="final",
classProbs=TRUE,
index=createResample(training$Class, 25)
)
library("rpart")
library("caretEnsemble")
model_list <- caretList(
Class~., data=training,
trControl=my_control,
tuneList = list(gbm = caretModelSpec(method = "gbm", verbose = F, tuneGrid = expand.grid(.n.trees = 100, .interaction.depth = 11, .shrinkage = 0.001, .n.minobsinnode = 10)), rf = caretModelSpec(method = "rf", ntree = 100), rpart = caretModelSpec(method = "rpart")
))
glm_ensemble <- caretStack(
model_list,
method="glm",
metric="ROC",
trControl=trainControl(
method="boot",
number=10,
savePredictions="final",
classProbs=TRUE,
summaryFunction=twoClassSummary
)
)
Then I will get an error message:
Error in check_caretList_model_types(list_of_models) :
Not yet implemented for multiclass problems
I've noticed that this is still an open issue, I thought it might be not easy to solve, so could you gave me several recommendations about is there any other packages in R could help me to stack models quickly and conveniently?
Thanks for you time ;p
I don't know of a package that lets you stack multiclass models. Maybe take a look at MLR?
https://github.com/mlr-org/mlr
Sent from my iPhone
On Jul 9, 2017, at 11:42 PM, Renzhi He notifications@github.com wrote:
@zachmayer Hey Zach, I am new to caretEnsemble, and I met the almost same problem as above.
library("caret") library("mlbench") library("pROC") data(Sonar)
I manually add one more category to
Class
varibalesed.seed(123) index <- as.integer(runif(n = 60,min = 1,max = 208)) Sonar$Class <- as.character(Sonar$Class) Sonar$Class[index] <- "Q" Sonar$Class <- as.factor(Sonar$Class)
now there are three levels in Class
[1] Q R Q Q R R R R R R R Q R R R R Q Q R R R R R R R Q Q Q R Q R R R Q R R R R R R Q R R R R ...
Levels: M Q R
set.seed(107) inTrain <- createDataPartition(y = Sonar$Class, p = .75, list = FALSE) training <- Sonar[ inTrain,] testing <- Sonar[-inTrain,] my_control <- trainControl( method="boot", number=25, savePredictions="final", classProbs=TRUE, index=createResample(training$Class, 25) )
library("rpart") library("caretEnsemble") model_list <- caretList( Class~., data=training, trControl=my_control, tuneList = list(gbm = caretModelSpec(method = "gbm", verbose = F, tuneGrid = expand.grid(.n.trees = 100, .interaction.depth = 11, .shrinkage = 0.001, .n.minobsinnode = 10)), rf = caretModelSpec(method = "rf", ntree = 100), rpart = caretModelSpec(method = "rpart") ))
glm_ensemble <- caretStack( model_list, method="glm", metric="ROC", trControl=trainControl( method="boot", number=10, savePredictions="final", classProbs=TRUE, summaryFunction=twoClassSummary ) ) Then I will get an error message:
Error in check_caretList_model_types(list_of_models) : Not yet implemented for multiclass problems I've noticed that this is still an open issue, I thought it might be not easy to solve, so could you gave me several recommendations about is there any other packages in R could help me to stack models quickly and conveniently?
Thanks for you time ;p
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
Any Updates on Multi-class classification?
I haven’t continued working on it yet
Sent from my iPhone
On Apr 22, 2018, at 1:20 AM, aminghari notifications@github.com wrote:
Any Updates on Multi-class classification?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
PR to add multiclass here: https://github.com/zachmayer/caretEnsemble/pull/260
i see that branch Dev has some more progress regarding multi-class classification ensemble stacking but unfortunately it is not yet done. do you plan on implementing this and/or could you point me in the right direction so i might be able to finish it? I don't seem to understand what the problem/holdup is (no offense intended)