zachmayer / caretEnsemble

caret models all the way down :turtle:
Other
226 stars 75 forks source link

trControl error in tunelist gbm1 alternatively modelCor(resamples(model_list_big)) fails for svmRadial #199

Closed sparcycram closed 8 years ago

sparcycram commented 8 years ago

This r script is for GH simple model

THIS CODE WILL RUN CORRECTLY but uncommenting in gbm1 or RS will cause the errors as #commented

clear lists

rm(list = ls())

Start the clock!

ptm <- proc.time()

setwd("c:/")

run from here

TRAINGH <- read.csv("c:/TRAINGH.csv") HOLDOUTGH <- read.csv("c:/HOLDOUT.csv") TRAIN <- subset(TRAINGH, select = -c(X)) HOLDOUT <- subset(HOLDOUTGH, select = -c(X))

str(TRAIN) str(HOLDOUT)

library('caret') library('mlbench') library('rpart') library('caretEnsemble') library('mlbench') library('randomForest') library("gbm") library("kernlab") library("e1071") library("plyr") library("class") library("caTools") library("stepPlr") library("pamr") library("MASS") library("glmnet") library("arm")

set.seed(1234) inTrain <- createDataPartition(y = TRAIN$Classification, p = .75, list = FALSE) training <- TRAIN[ inTrain,] testing <- TRAIN[-inTrain,]

str(training) getBinaryTargetLevel() str(testing) str(HOLDOUT)

RS<-createResample(training$Classification, times=6) this causes an error for svmRadial if I leave it in even if i dont use it????**

MF<-createMultiFolds(training$Classification, k=2, times=5)

my_control <- trainControl( method='cv', savePredictions="final", # to make it smaller use save predictions final TRUE is very large classProbs=TRUE, index=MF, summaryFunction= twoClassSummary )

---------------------parallel process begin----------------------

install parallel

install.packages("doParallel")

library(doParallel) cl <- makeCluster(detectCores(), type = 'PSOCK') registerDoParallel(cl)

cl

NB to turn off parallel processing and run sequential again:

registerDoSEQ()

---------------------parallel process end ------------------------

set procedure time

ptm1 <- proc.time() ptm2 <- proc.time()

set.seed(1234) model_list_big <- caretList( Classification~., data=training, trControl=my_control, metric= "ROC", maximize=TRUE, methodList=c("pam","bayesglm","svmRadial"),
tuneList=list(

if I uncomment trControl below in gbm1 thows error Error in data.frame(Fold1.Rep1 = c(1L, 3L, 5L, 7L, 8L, 10L, 12L, 14L, :

# arguments imply differing number of rows: 3320, 3321
gbm1=caretModelSpec(method="gbm", tuneGrid=data.frame(interaction.depth=seq(1,5, by=1),n.trees=seq(10,2000, by=10),
                                                      shrinkage=0.1, n.minobsinnode = 10)) #trControl=my_control)),

) )

model_list_big

Stop the clock

elapsed_time <- (proc.time() - ptm1)/60 elapsed_time

modelCor(resamples(model_list_big))

caretstack_ensemble <- caretStack( model_list_big, method = "glmnet", dfmax = 4, metric= "ROC", maximize=TRUE, trControl=trainControl( method='boot', number=10, savePredictions="final", classProbs=TRUE, summaryFunction = twoClassSummary ))

summary(caretstack_ensemble)

greedy_ensemble <- caretEnsemble( model_list_big, metric = "ROC", maximize=FALSE, trControl = trainControl( summaryFunction = twoClassSummary , classProbs = TRUE ))

summary(greedy_ensemble)

end

HOLDOUTGH.xlsx TRAINGH.xlsx

zachmayer commented 8 years ago

Please take some time to make the example minimal, be removing all the code not required to generate the error. Ideally, you can produce the error with 5-10 lines of minimal code.

Also, please generate a simulated dataset. I don't have time to download an excel file, install excel, convert it to a csv, and then read it into R.

Finally, please add the checklist back to your issue (I included it for a reason!):

- [ ] Start a new R session
- [ ] Install the latest version of caretEnsemble: `devtools::install_github("zachmayer/caretEnsemble")`
- [ ] Install the latest version of caret: `update.packages(oldPkgs="caret", ask=FALSE)`
- [ ] [Write a minimal reproducible example](http://stackoverflow.com/a/5963610)
- [ ] run `sessionInfo()`

Check off all the boxes when you've completed them.

zachmayer commented 8 years ago

The following is a great guide on making minimal examples. Please read it thoroughly, and take it to heart: http://stackoverflow.com/a/5963610

sparcycram commented 8 years ago

Thanks

I read it but I got stuck on pulling in the data. Sorry thats why I had to send in excel files for the data. For the life of me I couldn't work out what dump() was doing.

I have no idea why create resamples is causing the issue it is even when I am not using it in later code.

PS I didn't cut down the data as the problem was intermittent depending on which data set I was using.

Apologies if it makes it more difficult on your side.

zachmayer commented 8 years ago

If you can isolate what about a particular dataset causes the problem to occur, you'll isolate the problem.

Then you can make a minimal example using something like the iris dataset, (or something listed in library(help = "datasets"))

zachmayer commented 8 years ago

Please re-open the issue once you've isolated the problem. Thanks!