Closed sparcycram closed 8 years ago
Please take some time to make the example minimal, be removing all the code not required to generate the error. Ideally, you can produce the error with 5-10 lines of minimal code.
Also, please generate a simulated dataset. I don't have time to download an excel file, install excel, convert it to a csv, and then read it into R.
Finally, please add the checklist back to your issue (I included it for a reason!):
- [ ] Start a new R session
- [ ] Install the latest version of caretEnsemble: `devtools::install_github("zachmayer/caretEnsemble")`
- [ ] Install the latest version of caret: `update.packages(oldPkgs="caret", ask=FALSE)`
- [ ] [Write a minimal reproducible example](http://stackoverflow.com/a/5963610)
- [ ] run `sessionInfo()`
Check off all the boxes when you've completed them.
The following is a great guide on making minimal examples. Please read it thoroughly, and take it to heart: http://stackoverflow.com/a/5963610
Thanks
I read it but I got stuck on pulling in the data. Sorry thats why I had to send in excel files for the data. For the life of me I couldn't work out what dump() was doing.
I have no idea why create resamples is causing the issue it is even when I am not using it in later code.
PS I didn't cut down the data as the problem was intermittent depending on which data set I was using.
Apologies if it makes it more difficult on your side.
If you can isolate what about a particular dataset causes the problem to occur, you'll isolate the problem.
Then you can make a minimal example using something like the iris dataset, (or something listed in library(help = "datasets")
)
Please re-open the issue once you've isolated the problem. Thanks!
This r script is for GH simple model
THIS CODE WILL RUN CORRECTLY but uncommenting in gbm1 or RS will cause the errors as #commented
clear lists
rm(list = ls())
Start the clock!
ptm <- proc.time()
setwd("c:/")
run from here
TRAINGH <- read.csv("c:/TRAINGH.csv") HOLDOUTGH <- read.csv("c:/HOLDOUT.csv") TRAIN <- subset(TRAINGH, select = -c(X)) HOLDOUT <- subset(HOLDOUTGH, select = -c(X))
str(TRAIN) str(HOLDOUT)
library('caret') library('mlbench') library('rpart') library('caretEnsemble') library('mlbench') library('randomForest') library("gbm") library("kernlab") library("e1071") library("plyr") library("class") library("caTools") library("stepPlr") library("pamr") library("MASS") library("glmnet") library("arm")
set.seed(1234) inTrain <- createDataPartition(y = TRAIN$Classification, p = .75, list = FALSE) training <- TRAIN[ inTrain,] testing <- TRAIN[-inTrain,]
str(training) getBinaryTargetLevel() str(testing) str(HOLDOUT)
RS<-createResample(training$Classification, times=6) this causes an error for svmRadial if I leave it in even if i dont use it????**
MF<-createMultiFolds(training$Classification, k=2, times=5)
my_control <- trainControl( method='cv', savePredictions="final", # to make it smaller use save predictions final TRUE is very large classProbs=TRUE, index=MF, summaryFunction= twoClassSummary )
---------------------parallel process begin----------------------
install parallel
install.packages("doParallel")
library(doParallel) cl <- makeCluster(detectCores(), type = 'PSOCK') registerDoParallel(cl)
cl
NB to turn off parallel processing and run sequential again:
registerDoSEQ()
---------------------parallel process end ------------------------
set procedure time
ptm1 <- proc.time() ptm2 <- proc.time()
set.seed(1234) model_list_big <- caretList( Classification~., data=training, trControl=my_control, metric= "ROC", maximize=TRUE, methodList=c("pam","bayesglm","svmRadial"),
tuneList=list(
if I uncomment trControl below in gbm1 thows error Error in data.frame(Fold1.Rep1 = c(1L, 3L, 5L, 7L, 8L, 10L, 12L, 14L, :
) )
model_list_big
Stop the clock
elapsed_time <- (proc.time() - ptm1)/60 elapsed_time
modelCor(resamples(model_list_big))
caretstack_ensemble <- caretStack( model_list_big, method = "glmnet", dfmax = 4, metric= "ROC", maximize=TRUE, trControl=trainControl( method='boot', number=10, savePredictions="final", classProbs=TRUE, summaryFunction = twoClassSummary ))
summary(caretstack_ensemble)
greedy_ensemble <- caretEnsemble( model_list_big, metric = "ROC", maximize=FALSE, trControl = trainControl( summaryFunction = twoClassSummary , classProbs = TRUE ))
summary(greedy_ensemble)
end
HOLDOUTGH.xlsx TRAINGH.xlsx