Class predictions from ensemble classifiers are reversed

eric-czech commented 8 years ago

I've been banging my head against this as I've run into it with several datasets now, but I think I've got a good reproducible to show how caretEnsemble seems to be flipping classes on class label predictions (it also seems to be a problem with predictions of probabilities, but it's easier to demonstrate with labels).

Am I doing something dumb here? I can't see any reason why the predictions from an ensemble below would be inverted, as they appear to be below:

library(caretEnsemble)
library(caret)
set.seed(123)

# Create a very easily classifiable dataset with a binary response
X <- matrix(c(rnorm(n = 1500, mean=5), rnorm(n = 1500, mean=-5)), nrow = 1000, ncol=3)
y <- apply(X, 1, function(x) sum(x))
y <- sapply(1 / (1 + exp(-y)), function(p) factor(ifelse(runif(1) < p, 'yes', 'no'), levels=c('no', 'yes')))

# Split the above into training and test data 
train.idx <- createDataPartition(y)[[1]]

# Fit a glm to training data and examine the predictions on test data
glm.model <- train(X[train.idx,], y[train.idx], method='glm')
glm.pred <- predict(glm.model, newdata=X[-train.idx,], type='raw')
confusionMatrix(table(glm.pred, y[-train.idx]))

# The results are great, the glm model makes accurate out-of-sample predictions:
#      no   yes
# no  242   11
# yes   5   242

# Now do the same with a caret ensemble
model.list <- caretList(
  X[train.idx,], y[train.idx],
  trControl=trainControl(method="cv", number=10, savePredictions="final", classProbs=T),
  methodList=c("glmnet", "rda", "rf")
)
ens.model <- caretEnsemble(model.list)
ens.pred <- predict(ens.model, newdata=X[-train.idx,], type='raw')
confusionMatrix(table(ens.pred, y[-train.idx]))

# Now, somehow the labels seem to getting reversed on predicted classes:
#      no yes
# no    6 242
# yes 241  11

zachmayer commented 8 years ago

We're probably always sorting the class labels alphabetically (which probably also flips the probabilities when you ask for them)

I'll look into fixing this, but in the meantime, try coding your classes as 0/1 or "X0" and "X1" and see if you get the same problem.

Sent from my iPhone

On Feb 13, 2016, at 8:53 PM, Eric Czech notifications@github.com wrote:

I've been banging my head against this as I've run into it with several datasets now, but I think I've got a good reproducible to show how caretEnsemble seems to be flipping classes on class label predictions (it also seems to be a problem with predictions of probabilities, but it's easier to demonstrate with labels).

Am I doing something dumb here? I can't see any reason why the predictions from an ensemble below would be inverted, as they appear to be below:

library(caretEnsemble) library(caret) set.seed(123)

Create a very easily classifiable dataset with a binary response

X <- matrix(c(rnorm(n = 1500, mean=5), rnorm(n = 1500, mean=-5)), nrow = 1000, ncol=3) y <- apply(X, 1, function(x) sum(x)) y <- sapply(1 / (1 + exp(-y)), function(p) factor(ifelse(runif(1) < p, 'yes', 'no'), levels=c('no', 'yes')))

Split the above into training and test data

train.idx <- createDataPartition(y)[[1]]

Fit a glm to training data and examine the predictions on test data

glm.model <- train(X[train.idx,], y[train.idx], method='glm') glm.pred <- predict(glm.model, newdata=X[-train.idx,], type='raw') confusionMatrix(table(glm.pred, y[-train.idx]))

The results are great, the glm model makes accurate out-of-sample predictions:

no yes

no 242 11

yes 5 242

Now do the same with a caret ensemble

model.list <- caretList( X[train.idx,], y[train.idx], trControl=trainControl(method="cv", number=10, savePredictions="final", classProbs=T), methodList=c("glmnet", "rda", "rf") ) ens.model <- caretEnsemble(model.list) ens.pred <- predict(ens.model, newdata=X[-train.idx,], type='raw') confusionMatrix(table(ens.pred, y[-train.idx]))

Now, somehow the labels seem to getting reversed on predicted classes:

no yes

no 6 242

yes 241 11

— Reply to this email directly or view it on GitHub.

eric-czech commented 8 years ago

Makes sense. I had given what you suggested a shot before with no such luck, but I believe I found the problem.

In makePredObsMatrix (line 241) the following code was being used to select the positive class:

positive <- as.character(unique(modelLibrary$obs)[2]) #IMPROVE THIS!

The problem with that is that unique returns the values in order of appearance so a different positive class is potentially being selected for each training fold. For example:

unique(factor(c('positive', 'negative')))[2] # = 'negative'
unique(factor(c('negative', 'positive')))[2] # = 'positive'

I sent this PR your way with a fix and reprised the above for the sake of documentation.

eric-czech commented 8 years ago

Fixed by PR 190

PoLabs commented 5 years ago

I'm sorry, still a bit confused on this. I expect classes labeled 1 to be positive and correspond to the same idea of 'positive' between train and predict. Am I missing something? Still seems to be inconsistency between caret models and caret ensembles.

modelnameA <- 'Linear Ensemble'
for(c in 1:6){                                                                  
  predA <- predict(Alist[[c]], pred.vectorA)
  resultsA.df[c,1] <- ifelse(predA %in% c("yes","Yes"),1,0)
  predA.raw <- predict(Alist[[c]], pred.vectorA, type='prob')

  if(modelnameA %in% c('Linear ensemble', 'Stack Ensemble')){
    resultsA.df[c,2] <- as.double(predA.raw)
  }else{resultsA.df[c,2] <- as.double(predA.raw[1,2]) }

  if(modelnameA %in% c('Linear ensemble', 'Stack Ensemble')){
    resultsA.df[c,3] <- Alist[[c]]$error$ROC
  }else{resultsA.df[c,3] <- as.double(Alist[[c]]$results$ROC)    } }
print(resultsA.df) #linear ensemble

   bin       raw     train test
1:   1 0.2506535 0.9617082    0
2:   1 0.2414265 0.9640644    0
3:   0 0.9133324 0.9623150    0
4:   1 0.2604354 0.9579918    0
5:   1 0.1312498 0.9622277    0
6:   0 0.5286454 0.9685270    0

modelnameA <- 'bayesglm'
for(c in 1:6){                                                                  
  predA <- predict(Alist[[c]], pred.vectorA)
  resultsA.df[c,1] <- ifelse(predA %in% c("yes","Yes"),1,0)
  predA.raw <- predict(Alist[[c]], pred.vectorA, type='prob')

  if(modelnameA %in% c('Linear ensemble', 'Stack Ensemble')){
    resultsA.df[c,2] <- as.double(predA.raw)
  }else{resultsA.df[c,2] <- as.double(predA.raw[1,2]) }

  if(modelnameA %in% c('Linear ensemble', 'Stack Ensemble')){
    resultsA.df[c,3] <- Alist[[c]]$error$ROC
  }else{resultsA.df[c,3] <- as.double(Alist[[c]]$results$ROC)    } }
print(resultsA.df) #bayesglm

   bin        raw     train test
1:   1 0.75607332 0.6031932    0
2:   1 0.93098516 0.6635544    0
3:   1 0.81537941 0.6356258    0
4:   0 0.12287752 0.6427582    0
5:   0 0.21236349 0.6163513    0
6:   0 0.05765715 0.6893205    0

JanVanIm commented 5 years ago

I've got the same issue. Was the problem solved in the latest release ?

PoLabs commented 5 years ago

I'm using the latest GitHub install so I don't think so

juanbretti commented 5 years ago

Same issue here. My temporary solution:

library(forcats)
y_ensemble <- predict(gbm_ensemble, newdata=X_test, type="raw") %>% 
    fct_recode(Default = "Non-Default", Non-Default = "Default")

breezedu commented 4 years ago

lol, July 2019, I am having the same issue.

varelasebastian commented 4 years ago

also having the same issue...

kilianshi commented 3 years ago

Hello, it seems that the problem is not solved, or is it?

zachmayer commented 3 years ago

It looks like one instance of this was solved in 2016, but it appears to have regressed. I don't currently have time to dig into this issue and fix it, but I would be very happy to review and merge a pull request!

zachmayer / caretEnsemble

Class predictions from ensemble classifiers are reversed #189

Create a very easily classifiable dataset with a binary response

Split the above into training and test data

Fit a glm to training data and examine the predictions on test data

The results are great, the glm model makes accurate out-of-sample predictions:

no yes

no 242 11

yes 5 242

Now do the same with a caret ensemble

Now, somehow the labels seem to getting reversed on predicted classes:

no yes

no 6 242

yes 241 11