zachmayer / caretEnsemble

caret models all the way down :turtle:
Other
226 stars 75 forks source link

Stacking Regressions using caretEnsemble function #179

Closed Mosquito00 closed 8 years ago

Mosquito00 commented 8 years ago

Dear all,

I would like to use the code from zachmayer. Unfortunately, I get an error in the following line:

"greedy = caretEnsemble(all.models, iter=1000L)"

Error: x$control$savePredictions ist nicht TRUE

However, I tried to solve the issue in the following line:

"all.models = caretList(X[train,], Y[train], trControl=myControl,methodList=c('gbm', 'blackboost'))"

This line produces also a warning message and says:

Error in sensitivity.default(data[, "pred"], data[, "obs"], lev[1]) : inputs must be factors In addition: Warning message: In train.default(list(crim = c(0.03237, 0.06905, 0.02985, 0.08829, : cannnot compute class probabilities for regression

Therefore, I thought that the error lies in the second line and I tried to convert X[train,] and Y[train] into factors, but this did not work out as I assumed.

The code is the following:

set.seed(40)

library(caret) library(devtools) library(caretEnsemble)

Data

library(mlbench) data(BostonHousing2)

X = model.matrix(cmedv~crim+zn+indus+chas+nox+rm+age+dis+ rad+tax+ptratio+b+lstat+lat+lon, BostonHousing2)[,-1] X = data.frame(X)

Y = BostonHousing2$cmedv

train = runif(nrow(X)) <= .66

folds=5 repeats=1

fold cross-validations are used as the resampling scheme. myControl = trainControl(method='cv', summaryFunction=twoClassSummary, number = folds, repeats = repeats, classProbs=TRUE, savePredictions=TRUE, index=createMultiFolds(Y[train], k=folds, times=repeats))

PP = c('center', 'scale')

names(all.models) = sapply(all.models, function(x) x$method) sort(sapply(all.models, function(x) min(x$results$RMSE)))

regression, elastic net regression, or greedy optimization. print(all.models)

greedy = caretEnsemble(all.models, iter=1000L) print(greedy) sort(greedy$weights, decreasing=TRUE) greedy$error

Any help would be appreciated.

zachmayer commented 8 years ago

Don't use twoClassSummary for regression.

Mosquito00 commented 8 years ago

Hello zachmayer,

Thank you for your answer. I deleted the twoClassSummary, but the code still doesn´t work.

I got the following error:

Error: x$control$savePredictions ist nicht TRUE

zachmayer commented 8 years ago

Add savePredictions=TRUE to your trainControl.

Mosquito00 commented 8 years ago

My trainControl is:

myControl = trainControl(method='cv', summaryFunction=twoClassSummary, number = folds, repeats = repeats, classProbs=TRUE, savePredictions=TRUE, index=createMultiFolds(Y[train], k=folds, times=repeats))

savePredictions was already set to TRUE... Still, it did not work.

Did you run the code? Does it work on your computer?

Thank you.

zachmayer commented 8 years ago

Please provide a minimal reproducible example I can copy/paste into a fresh r session and replicate the error:
http://stackoverflow.com/a/5963610

Sent from my iPhone

On Jan 1, 2016, at 7:35 AM, Mosquito00 notifications@github.com wrote:

My trainControl is:

myControl = trainControl(method='cv', summaryFunction=twoClassSummary, number = folds, repeats = repeats, classProbs=TRUE, savePredictions=TRUE, index=createMultiFolds(Y[train], k=folds, times=repeats))

savePredictions was already set to TRUE... Still, it did not work.

Did you run the code? Does it work on your computer?

Thank you.

— Reply to this email directly or view it on GitHub.

Mosquito00 commented 8 years ago

Hello zachmayer,

I already provided the code above...

Here is the code:

set.seed(40)

library(caret) library(devtools) library(caretEnsemble)

Data

library(mlbench) data(BostonHousing2)

X = model.matrix(cmedv~crim+zn+indus+chas+nox+rm+age+dis+ rad+tax+ptratio+b+lstat+lat+lon, BostonHousing2)[,-1] X = data.frame(X)

Y = BostonHousing2$cmedv

train = runif(nrow(X)) <= .66

folds=5 repeats=1

fold cross-validations are used as the resampling scheme. myControl = trainControl(method='cv', summaryFunction=twoClassSummary, number = folds, repeats = repeats, classProbs=TRUE, savePredictions=TRUE, index=createMultiFolds(Y[train], k=folds, times=repeats))

PP = c('center', 'scale')

names(all.models) = sapply(all.models, function(x) x$method) sort(sapply(all.models, function(x) min(x$results$RMSE)))

regression, elastic net regression, or greedy optimization. print(all.models)

greedy = caretEnsemble(all.models, iter=1000L) print(greedy) sort(greedy$weights, decreasing=TRUE) greedy$error

Thank you.

zachmayer commented 8 years ago

How many lines of this script can you remove while still getting the error?

Sent from my iPhone

On Jan 2, 2016, at 4:23 AM, Mosquito00 notifications@github.com wrote:

Hello zachmayer,

I already provided the code above...

Here is the code:

set.seed(40)

library(caret) library(devtools) library(caretEnsemble)

Data

library(mlbench) data(BostonHousing2)

X = model.matrix(cmedv~crim+zn+indus+chas+nox+rm+age+dis+ rad+tax+ptratio+b+lstat+lat+lon, BostonHousing2)[,-1] X = data.frame(X)

Y = BostonHousing2$cmedv

train = runif(nrow(X)) <= .66

folds=5 repeats=1

fold cross-validations are used as the resampling scheme. myControl = trainControl(method='cv', summaryFunction=twoClassSummary, number = folds, repeats = repeats, classProbs=TRUE, savePredictions=TRUE, index=createMultiFolds(Y[train], k=folds, times=repeats))

PP = c('center', 'scale')

names(all.models) = sapply(all.models, function(x) x$method) sort(sapply(all.models, function(x) min(x$results$RMSE)))

regression, elastic net regression, or greedy optimization. print(all.models)

greedy = caretEnsemble(all.models, iter=1000L) print(greedy) sort(greedy$weights, decreasing=TRUE) greedy$error

Thank you.

— Reply to this email directly or view it on GitHub.

Mosquito00 commented 8 years ago

Actually, this is the shortest version of my code. There must be something wrong with the

myControl = trainControl(method='cv', summaryFunction=twoClassSummary, number = folds, repeats = repeats, classProbs=TRUE, savePredictions=TRUE, index=createMultiFolds(Y[train], k=folds, times=repeats))

function or the caretEnsemble():

greedy = caretEnsemble(all.models, iter=1000L)

Did you also get the same error?

zachmayer commented 8 years ago

Is this a regression or classification problem? If it's regression, remove the twoClassSummary bit.

Sent from my iPhone

On Jan 2, 2016, at 11:02 AM, Mosquito00 notifications@github.com wrote:

Actually, this is the shortest version of my code. There must be something wrong with the

myControl = trainControl(method='cv', summaryFunction=twoClassSummary, number = folds, repeats = repeats, classProbs=TRUE, savePredictions=TRUE, index=createMultiFolds(Y[train], k=folds, times=repeats))

function or the caretEnsemble():

greedy = caretEnsemble(all.models, iter=1000L)

Did you also get the same error?

— Reply to this email directly or view it on GitHub.

Mosquito00 commented 8 years ago

It is a regression problem...

set.seed(40) library(caret) library(devtools) library(caretEnsemble)

library(mlbench) data(BostonHousing2)

X = model.matrix(cmedv~crim+zn+indus+chas+nox+rm+age+dis+ rad+tax+ptratio+b+lstat+lat+lon, BostonHousing2)[,-1] X = data.frame(X) Y = BostonHousing2$cmedv

train = runif(nrow(X)) <= .66 folds=5 repeats=1

myControl = trainControl(method='cv', number = folds, repeats = repeats, classProbs=TRUE, savePredictions=TRUE, index=createMultiFolds(Y[train], k=folds, times=repeats))

PP = c('center', 'scale')

all.models = caretList(X[train,], Y[train], trControl=myControl,methodList=c('gbm', 'blackboost'))

names(all.models) = sapply(all.models, function(x) x$method)

sort(sapply(all.models, function(x) min(x$results$RMSE)))

greedy = caretEnsemble(all.models, iter=1000L) print(greedy) sort(greedy$weights, decreasing=TRUE) greedy$error

This is my code.. I removed twoClassSummary and it still gives back the same error.

Thank you.

zachmayer commented 8 years ago

Yup, this is a bug. all.models[[1]]$control$savePredictions is "all".

You can downgrade your version of caret, as savePredictions can now be "all", "best" or "none" I think. I'll fix this in the next release of caretEnsemble.

zachmayer commented 8 years ago

Fix here: https://github.com/zachmayer/caretEnsemble/pull/181

Mosquito00 commented 8 years ago

Thank you for your answer.

As I am new in R, I do not know how to downgrade the version of caret. Do you have any suggestions?

Mabe there is another possibility to perform a stacked regression? Do you incidentally know another package for stacked regression? Or a possibility how to optimize the weights in a stacked regression?

Thank you very much in advance.

zachmayer commented 8 years ago

http://stackoverflow.com/questions/17082341/installing-older-version-of-r-package

zachmayer commented 8 years ago

This should be fixed now.

zachmayer commented 8 years ago

Fix was here: https://github.com/zachmayer/caretEnsemble/pull/185