zachmayer / caretEnsemble

caret models all the way down :turtle:
Other
226 stars 75 forks source link

Multi-class classification greedy optimization #8

Open thomaskern opened 10 years ago

thomaskern commented 10 years ago

i see that branch Dev has some more progress regarding multi-class classification ensemble stacking but unfortunately it is not yet done. do you plan on implementing this and/or could you point me in the right direction so i might be able to finish it? I don't seem to understand what the problem/holdup is (no offense intended)

zee86 commented 7 years ago

Thank you very much for the reply. I am very sorry if my next question going to sound very naive to you and a waste of time, but I am truly not expert and trying to learn to do it. so is it mean having this error from the following code , is because caretEnsemble does not support multiclass? and is there any R package you may know can allow me to do the ensemble of those different trained models ?

Error in check_bestpreds_obs(modelLibrary) : Observed values for each component model are not the same. Please re-train the models with the same Y variable

    library(mice)
    library(e1071)
    library(caret)
    library("caretEnsemble")

data <- iris
#Generate 10% missing values at Random 
iris.mis <- prodNA(iris, noNA = 0.1)
#remove categorical variables
iris.mis <- subset(iris.mis, select = -c(Species))

# 5 Imputation using mice pmm

imp <- mice(iris.mis, m=5, maxit = 10, method = 'pmm', seed = 500)

# save 5 imputed dataset.
x1 <- complete(imp, action = 1, include = FALSE)
x2 <- complete(imp, action = 2, include = FALSE)
x3 <- complete(imp, action = 3, include = FALSE)
x4 <- complete(imp, action = 4, include = FALSE)
x5 <- complete(imp, action = 5, include = FALSE)

## Apply the following method for each imputed set 

form <- iris$Sepal.Width # target coloumn
n <- nrow(x1)  # since all data sample are the same length
prop <- n%/%fold
set.seed(7)
newseq <- rank(runif(n))
k <- as.factor((newseq - 1)%/%prop + 1)
i<-1
CVfolds <- 10
CVrepeats <- 3
  indexPreds <- createMultiFolds(x1[k != i,]$Sepal.Width, CVfolds, CVrepeats)
  ctrl <- trainControl(method = "repeatedcv", repeats = CVrepeats,number = CVfolds, returnResamp = "all", savePredictions = "all", index = indexPreds)

fit1 <- train(Sepal.Width ~., data = x1[k !=i, ],method='svmLinear2',trControl = ctrl)
fit2 <- train(Sepal.Width ~., data = x2[k != i, ],method='svmLinear2',trControl = ctrl)
fit3 <- train(Sepal.Width ~., data = x3[k != i, ],method='svmLinear2',trControl = ctrl)
fit4 <- train(Sepal.Width ~., data = x4[k != i, ],method='svmLinear2',trControl = ctrl)
fit5 <- train(Sepal.Width ~., data = x5[k != i, ],method='svmLinear2',trControl = ctrl)
#combine the created model to a list
      svm.fit <- list( fit1,  fit2,  fit3,  fit4,  fit5)

  # convert the list to cartlist
  class(svm.fit) <- "caretList" 

  #create the ensemble where the error occur.
  svm.all <- caretEnsemble(svm.fit,method='svmLinear2')

Additional note: basically the above code is creating five imputed dataset then apply SVM to each imputed dataset using the train function in caret, then ensemble the resulted training model using caretEnsemble. to be able at the end to predict each test set using the ensemble model.

zachmayer commented 7 years ago

This has nothing to do with multiclass. Please open a new issue.

shreyaa94 commented 7 years ago

I have a problem in using varImp for continuous response variable under caretensemble.I have used nnet,gbm and rf for ensembling.Error is showing like "Error in varImp[, "%IncMSE"] : subscript out of bounds".

shreyaa94 commented 7 years ago

is varImp not applicable for continuous response case? while using caretEnsemble I have this warning also "In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, : There were missing values in resampled performance measures."

zachmayer commented 7 years ago

@shreyaa94 Please post a reproducible example in the caret repo. (varImp is a caret function).

zachmayer commented 7 years ago

@shreyaa94 There were missing values in resampled performance measures It's generally ok if a couple models fail, which is what that warning tells you.

JackHo327 commented 6 years ago

@zachmayer Hey Zach, I am new to caretEnsemble, and I met the almost same problem as above.

library("caret")
library("mlbench")
library("pROC")
data(Sonar)

# I manually add one more category to `Class` varibale
sed.seed(123)
index <- as.integer(runif(n = 60,min = 1,max = 208))
Sonar$Class <- as.character(Sonar$Class)
Sonar$Class[index] <- "Q"
Sonar$Class <- as.factor(Sonar$Class)

# now there are three levels in Class
# [1] Q R Q Q R R R R R R R Q R R R R Q Q R R R R R R R Q Q Q R Q R R R Q R R R R R R Q R R R R ...
# Levels: M Q R

set.seed(107)
inTrain <- createDataPartition(y = Sonar$Class, p = .75, list = FALSE)
training <- Sonar[ inTrain,]
testing <- Sonar[-inTrain,]
my_control <- trainControl(
      method="boot",
      number=25,
      savePredictions="final",
      classProbs=TRUE,
      index=createResample(training$Class, 25)
)

library("rpart")
library("caretEnsemble")
model_list <- caretList(
      Class~., data=training,
      trControl=my_control,
      tuneList = list(gbm = caretModelSpec(method = "gbm", verbose = F, tuneGrid = expand.grid(.n.trees = 100, .interaction.depth = 11, .shrinkage = 0.001, .n.minobsinnode = 10)), rf = caretModelSpec(method = "rf", ntree = 100), rpart = caretModelSpec(method = "rpart")
))

glm_ensemble <- caretStack(
      model_list,
      method="glm",
      metric="ROC",
      trControl=trainControl(
            method="boot",
            number=10,
            savePredictions="final",
            classProbs=TRUE,
            summaryFunction=twoClassSummary
      )
)

Then I will get an error message:

Error in check_caretList_model_types(list_of_models) : 
  Not yet implemented for multiclass problems

I've noticed that this is still an open issue, I thought it might be not easy to solve, so could you gave me several recommendations about is there any other packages in R could help me to stack models quickly and conveniently?

Thanks for you time ;p

zachmayer commented 6 years ago

I don't know of a package that lets you stack multiclass models. Maybe take a look at MLR?

https://github.com/mlr-org/mlr

Sent from my iPhone

On Jul 9, 2017, at 11:42 PM, Renzhi He notifications@github.com wrote:

@zachmayer Hey Zach, I am new to caretEnsemble, and I met the almost same problem as above.

library("caret") library("mlbench") library("pROC") data(Sonar)

I manually add one more category to Class varibale

sed.seed(123) index <- as.integer(runif(n = 60,min = 1,max = 208)) Sonar$Class <- as.character(Sonar$Class) Sonar$Class[index] <- "Q" Sonar$Class <- as.factor(Sonar$Class)

now there are three levels in Class

[1] Q R Q Q R R R R R R R Q R R R R Q Q R R R R R R R Q Q Q R Q R R R Q R R R R R R Q R R R R ...

Levels: M Q R

set.seed(107) inTrain <- createDataPartition(y = Sonar$Class, p = .75, list = FALSE) training <- Sonar[ inTrain,] testing <- Sonar[-inTrain,] my_control <- trainControl( method="boot", number=25, savePredictions="final", classProbs=TRUE, index=createResample(training$Class, 25) )

library("rpart") library("caretEnsemble") model_list <- caretList( Class~., data=training, trControl=my_control, tuneList = list(gbm = caretModelSpec(method = "gbm", verbose = F, tuneGrid = expand.grid(.n.trees = 100, .interaction.depth = 11, .shrinkage = 0.001, .n.minobsinnode = 10)), rf = caretModelSpec(method = "rf", ntree = 100), rpart = caretModelSpec(method = "rpart") ))

glm_ensemble <- caretStack( model_list, method="glm", metric="ROC", trControl=trainControl( method="boot", number=10, savePredictions="final", classProbs=TRUE, summaryFunction=twoClassSummary ) ) Then I will get an error message:

Error in check_caretList_model_types(list_of_models) : Not yet implemented for multiclass problems I've noticed that this is still an open issue, I thought it might be not easy to solve, so could you gave me several recommendations about is there any other packages in R could help me to stack models quickly and conveniently?

Thanks for you time ;p

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

aminghari commented 6 years ago

Any Updates on Multi-class classification?

zachmayer commented 6 years ago

I haven’t continued working on it yet

Sent from my iPhone

On Apr 22, 2018, at 1:20 AM, aminghari notifications@github.com wrote:

Any Updates on Multi-class classification?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

zachmayer commented 2 weeks ago

PR to add multiclass here: https://github.com/zachmayer/caretEnsemble/pull/260