zachmayer / caretEnsemble

caret models all the way down :turtle:
Other
226 stars 75 forks source link

Error when making prediction using a greedy ensemble model #206

Open gdronald opened 8 years ago

gdronald commented 8 years ago

HI, I was testing the CaretEnsemble package that you built and faced an error that is really strange when making prediction on a test sample using a greedy ensemble model :

The error: Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : type (list) incorrect pour la variable 'model3'

My script: library("caret") library("mlbench") library("pROC") data(Sonar) set.seed(107) inTrain <- createDataPartition(y = Sonar$Class, p = .75, list = FALSE) training <- Sonar[ inTrain,] testing <- Sonar[-inTrain,] my_control <- trainControl( method="boot", number=25, savePredictions="final", classProbs=TRUE, index=createResample(training$Class, 25), summaryFunction=twoClassSummary )

table(training$Class)

library("rpart") library("caretEnsemble") model_list <- caretList( Class~., data=training, trControl=my_control, methodList=c("glm", "rpart"), tuneList=list( model3=caretModelSpec(method='gbm',preProcess=PP), model4=caretModelSpec(method='mlpWeightDecay', trace=FALSE, preProcess="pca"), model5=caretModelSpec(method="knn", preProcess=PP), model6=caretModelSpec(method="earth", preProcess=PP), model7=caretModelSpec(method="svmRadial", preProcess=PP), model8=caretModelSpec(method="glmnet", preProcess=PP)

model8=caretModelSpec(method="gam", preProcess=PP),

# model9=caretModelSpec(method="glmnet", preProcess=PP)

) )

greedy_ensemble <- caretEnsemble( model_list, metric="ROC", trControl=trainControl( number=2, summaryFunction=twoClassSummary, classProbs=TRUE )) summary(greedy_ensemble)

library("caTools") model_preds <- lapply(model_list, predict, newdata=testing, type="prob") model_preds <- lapply(model_preds, function(x) x[,"M"]) model_preds <- data.frame(model_preds) ens_preds <- predict(greedy_ensemble, newdata=testing, type="prob")

Thank you for your help.

zachmayer commented 8 years ago

I get the following error when I try to run your code:

Error in caretModelSpec(method = "gbm", preProcess = PP) : 
  object 'PP' not found

Please fix that, and please also remove all code that's not 100% required to re-create the issue. In other words, please fulfill the "producing minimal code" requirement of a minimal, reproducible example:

This should be the easy part but often isn't. What you should not do, is:

  • add all kind of data conversions. Make sure the provided data is already in the correct format (unless that is the problem of course)
  • copy-paste a whole function / chunk of code that gives an error. First try to locate which lines exactly result in the error. More often than not you'll find out what the problem is yourself.
gdronald commented 8 years ago

Hi Zachmayer sorry for the missing code, here are all the needs :

library("caret") library("mlbench") library("pROC") data(Sonar) set.seed(107) inTrain <- createDataPartition(y = Sonar$Class, p = .75, list = FALSE) training <- Sonar[ inTrain,] testing <- Sonar[-inTrain,] my_control <- trainControl( method="boot", number=25, savePredictions="final", classProbs=TRUE, index=createResample(training$Class, 25), summaryFunction=twoClassSummary )

PP <- c('center', 'scale')

library("rpart") library("caretEnsemble") model_list <- caretList( Class~., data=training, trControl=my_control, methodList=c("glm", "rpart"), tuneList=list( model3=caretModelSpec(method='gbm',preProcess=PP), model4=caretModelSpec(method='mlpWeightDecay', trace=FALSE, preProcess="pca"), model5=caretModelSpec(method="knn", preProcess=PP), model6=caretModelSpec(method="earth", preProcess=PP), model7=caretModelSpec(method="svmRadial", preProcess=PP), model8=caretModelSpec(method="glmnet", preProcess=PP)

) )

greedy_ensemble <- caretEnsemble( model_list, metric="ROC", trControl=trainControl( number=2, summaryFunction=twoClassSummary, classProbs=TRUE )) summary(greedy_ensemble)

library("caTools") model_preds <- lapply(model_list, predict, newdata=testing, type="prob") model_preds <- lapply(model_preds, function(x) x[,"M"]) model_preds <- data.frame(model_preds) ens_preds <- predict(greedy_ensemble, newdata=testing, type="prob") The error occured at ens_preds instruction above.

Thank you.

zachmayer commented 8 years ago

I am able to run that code with no error:

ens_preds <- predict(greedy_ensemble, newdata=testing, type="prob")
> ens_preds
 [1] 0.22113926 0.46189214 0.83107713 0.26521616 0.89912570 0.05763430 0.56992742 0.04558505 0.05626683 0.08674816 0.06560082 0.52854768
[13] 0.30151370 0.08485551 0.10515085 0.05631715 0.06103305 0.09589297 0.23748462 0.37288981 0.76740240 0.25148869 0.05833128 0.06139028
[25] 0.92361332 0.68290920 0.81715373 0.48188282 0.47524734 0.66022054 0.89475132 0.94673095 0.86752851 0.72301821 0.83906958 0.90015828
[37] 0.81798337 0.90966529 0.83325843 0.35752291 0.81920851 0.61607431 0.81110927 0.92682841 0.92653518 0.92724546 0.93436422 0.88311111
[49] 0.88643939 0.87413880 0.88410738

What's your sessionInfo()? Mine is:

> sessionInfo()
R version 3.2.4 Revised (2016-03-16 r70336)
Platform: x86_64-apple-darwin15.3.0 (64-bit)
Running under: OS X 10.11.3 (El Capitan)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  splines   stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] glmnet_2.0-5        foreach_1.4.3       Matrix_1.2-4        kernlab_0.9-24      earth_4.4.4         plotmo_3.1.4       
 [7] TeachingDemos_2.10  plotrix_3.6-1       caTools_1.17.1      RSNNS_0.4-7         Rcpp_0.12.5         plyr_1.8.3         
[13] gbm_2.1.1           survival_2.38-3     caretEnsemble_2.0.0 rpart_4.1-10        pROC_1.8            mlbench_2.1-1      
[19] caret_6.0-64        lattice_0.20-33     ggplot2_2.1.0       rmongodb_1.8.0      data.table_1.9.6   

loaded via a namespace (and not attached):
 [1] compiler_3.2.4     nloptr_1.0.4       bitops_1.0-6       iterators_1.0.8    tools_3.2.4        digest_0.6.9       lme4_1.1-11       
 [8] jsonlite_0.9.21    nlme_3.1-126       gtable_0.2.0       mgcv_1.8-12        SparseM_1.7        gridExtra_2.2.1    stringr_1.0.0     
[15] MatrixModels_0.4-1 stats4_3.2.4       grid_3.2.4         nnet_7.3-12        pbapply_1.2-0      minqa_1.2.4        reshape2_1.4.1    
[22] car_2.1-1          magrittr_1.5       scales_0.4.0       codetools_0.2-14   MASS_7.3-45        pbkrtest_0.4-6     colorspace_1.2-6  
[29] labeling_0.3       quantreg_5.21      stringi_1.0-1      munsell_0.4.3      chron_2.3-47      
gdronald commented 8 years ago

Thank for your reactivity, here is mine (I'm using Rstudio) :

R version 3.2.5 (2016-04-14) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows >= 8 x64 (build 9200)

locale: [1] LC_COLLATE=French_France.1252 LC_CTYPE=French_France.1252 LC_MONETARY=French_France.1252 [4] LC_NUMERIC=C LC_TIME=French_France.1252

attached base packages: [1] parallel splines stats graphics grDevices utils datasets methods base

other attached packages: [1] caTools_1.17.1 glmnet_2.0-5 foreach_1.4.3 Matrix_1.2-4 mgcv_1.8-12
[6] nlme_3.1-125 kernlab_0.9-24 earth_4.4.4 plotmo_3.1.4 TeachingDemos_2.10 [11] plotrix_3.6-1 RSNNS_0.4-7 Rcpp_0.12.4 plyr_1.8.3 gbm_2.1.1
[16] survival_2.38-3 caretEnsemble_2.0.0 rpart_4.1-10 pROC_1.8 mlbench_2.1-1
[21] caret_6.0-68 ggplot2_2.1.0 lattice_0.20-33

loaded via a namespace (and not attached): [1] zoo_1.7-13 modeltools_0.2-21 coin_1.1-2 reshape2_1.4.1 pbapply_1.2-1 colorspace_1.2-6
[7] stats4_3.2.5 chron_2.3-47 nloptr_1.0.4 multcomp_1.4-5 stringr_1.0.0 MatrixModels_0.4-1 [13] munsell_0.4.3 gtable_0.2.0 mvtnorm_1.0-5 codetools_0.2-14 strucchange_1.5-1 SparseM_1.7
[19] quantreg_5.21 pbkrtest_0.4-6 TH.data_1.0-7 party_1.0-25 scales_0.4.0 stabs_0.5-1
[25] lme4_1.1-12 gridExtra_2.2.1 digest_0.6.9 stringi_1.0-1 grid_3.2.5 bitops_1.0-6
[31] quadprog_1.5-5 tools_3.2.5 sandwich_2.3-4 magrittr_1.5 mboost_2.6-0 car_2.1-2
[37] MASS_7.3-45 data.table_1.9.6 nnls_1.4 minqa_1.2.4 iterators_1.0.8 compiler_3.2.5
[43] nnet_7.3-12

It is really strange because i try it on another dataset and got the same type of error. Don't know if my session info can help. Thank you !

zachmayer commented 8 years ago

Very odd. If you have access to another environment, (e.g. a mac or linux), try reproducing it there.

zachmayer commented 8 years ago

Updated all my packages, and still got no error. Updateing to R 3.3.0 and trying again:

> sessionInfo()
R version 3.2.4 Revised (2016-03-16 r70336)
Platform: x86_64-apple-darwin15.3.0 (64-bit)
Running under: OS X 10.11.3 (El Capitan)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  splines   stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] caTools_1.17.1      glmnet_2.0-5        foreach_1.4.3       Matrix_1.2-6        kernlab_0.9-24      earth_4.4.4        
 [7] plotmo_3.1.4        TeachingDemos_2.10  plotrix_3.6-2       RSNNS_0.4-7         Rcpp_0.12.5         plyr_1.8.3         
[13] gbm_2.1.1           survival_2.39-4     caretEnsemble_2.0.0 rpart_4.1-10        pROC_1.8            mlbench_2.1-1      
[19] caret_6.0-68        ggplot2_2.1.0       lattice_0.20-33    

loaded via a namespace (and not attached):
 [1] compiler_3.2.4     nloptr_1.0.4       bitops_1.0-6       iterators_1.0.8    tools_3.2.4        lme4_1.1-12        digest_0.6.9      
 [8] nlme_3.1-128       gtable_0.2.0       mgcv_1.8-12        SparseM_1.7        gridExtra_2.2.1    stringr_1.0.0      MatrixModels_0.4-1
[15] stats4_3.2.4       grid_3.2.4         nnet_7.3-12        data.table_1.9.6   pbapply_1.2-1      minqa_1.2.4        reshape2_1.4.1    
[22] car_2.1-2          magrittr_1.5       scales_0.4.0       codetools_0.2-14   MASS_7.3-45        rsconnect_0.4.3    pbkrtest_0.4-6    
[29] colorspace_1.2-6   quantreg_5.24      stringi_1.1.1      munsell_0.4.3      chron_2.3-47      
zachmayer commented 8 years ago

Still can't reproduce after a full update:

sessionInfo()
R version 3.3.0 (2016-05-03)
Platform: x86_64-apple-darwin15.3.0 (64-bit)
Running under: OS X 10.11.3 (El Capitan)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  splines   stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] caTools_1.17.1      glmnet_2.0-5        foreach_1.4.3      
 [4] Matrix_1.2-6        kernlab_0.9-24      earth_4.4.4        
 [7] plotmo_3.1.4        TeachingDemos_2.10  plotrix_3.6-2      
[10] RSNNS_0.4-7         Rcpp_0.12.5         plyr_1.8.3         
[13] gbm_2.1.1           survival_2.39-4     caretEnsemble_2.0.0
[16] rpart_4.1-10        pROC_1.8            mlbench_2.1-1      
[19] caret_6.0-68        ggplot2_2.1.0       lattice_0.20-33    

loaded via a namespace (and not attached):
 [1] compiler_3.3.0     nloptr_1.0.4       bitops_1.0-6      
 [4] iterators_1.0.8    tools_3.3.0        lme4_1.1-12       
 [7] digest_0.6.9       nlme_3.1-128       gtable_0.2.0      
[10] mgcv_1.8-12        SparseM_1.7        gridExtra_2.2.1   
[13] stringr_1.0.0      MatrixModels_0.4-1 stats4_3.3.0      
[16] grid_3.3.0         nnet_7.3-12        data.table_1.9.6  
[19] pbapply_1.2-1      minqa_1.2.4        reshape2_1.4.1    
[22] car_2.1-2          magrittr_1.5       scales_0.4.0      
[25] codetools_0.2-14   MASS_7.3-45        pbkrtest_0.4-6    
[28] colorspace_1.2-6   quantreg_5.24      stringi_1.1.1     
[31] munsell_0.4.3      chron_2.3-47      

Try installing R 3.3.0 and updating all your packages.

gdronald commented 8 years ago

Hi, i try R 3.3.0 through the last version of Rstudio on an ubuntu VM (linux), but still have the same error:

Error in eval(expr, envir, enclos) : objet 'model3' introuvable 15 eval(expr, envir, enclos) 14 eval(predvars, data, env) 13 model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) 12 model.frame(Terms, newdata, na.action = na.action, xlev = object$xlevels) 11 predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type == "link", "response", type), terms = terms, na.action = na.action) 10 predict.glm(modelFit, newdata, type = "response") 9 predict(modelFit, newdata, type = "response") 8 predict(modelFit, newdata, type = "response") at glm.R#45 7 method$prob(modelFit = modelFit, newdata = newdata, submodels = param) 6 probFunction(method = object$modelInfo, modelFit = object$finalModel, newdata = newdata, preProc = object$preProcess) 5 predict.train(object$ens_model, newdata = preds, ...) 4 predict(object$ens_model, newdata = preds, ...) 3 predict.caretStack(greedy_ensemble, newdata = testing, type = "prob") 2 predict(greedy_ensemble, newdata = testing, type = "prob") 1 predict(greedy_ensemble, newdata = testing, type = "prob")

It's incredible, do you use Rstudio? may be i should try directly on R

zachmayer commented 8 years ago

I'm using Rstudio. Try cut and pasting the code you posted above into a fresh R session. I suspect you have a typo or a bug somewhere

gdronald commented 8 years ago

the test is done under R directly and i have the same error. Don't know what to do next.

zachmayer commented 8 years ago

Can you reproduce it on another computer?

I can't debug it if I can't get my computer to error in the same way =/

gdronald commented 8 years ago

Hi, Done but no change : eds <- lapply(model_list, predict, newdata=testing, type="prob")

model_preds <- lapply(model_preds, function(x) x[,"M"]) model_preds <- data.frame(model_preds) ens_preds <- predict(greedy_ensemble, newdata=testing, type="prob") Error in eval(expr, envir, enclos) : objet 'model3' introuvable

jrowen commented 7 years ago

I'm running into a similar error (unfortunately, I don't have a minimal example available).

library("caTools")

mdl_preds <- lapply(mod$models, predict, newdata = DT_test, type = "prob")
mdl_preds <- lapply(mdl_preds, function(x) x[, pos_outcm])
mdl_preds <- data.frame(mdl_preds)
# mod is caretStack class
mdl_preds$ensemble <- predict(mod, newdata = DT_test, type = "prob")

Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : 
  invalid type (list) for variable 'svmMod'

The other example above also generates the same error.

Error in eval(expr, envir, enclos) : object 'model3' not found

Let me know if there is anything I can do to assist with debugging.

> sessionInfo()
R version 3.3.0 (2016-05-03)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  splines   stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] caTools_1.17.1      glmnet_2.0-5        foreach_1.4.3       Matrix_1.2-6       
 [5] kernlab_0.9-24      earth_4.4.4         plotmo_3.1.4        TeachingDemos_2.10 
 [9] plotrix_3.6-2       RSNNS_0.4-7         Rcpp_0.12.5         plyr_1.8.4         
[13] gbm_2.1.1           survival_2.39-2     caretEnsemble_2.0.0 rpart_4.1-10       
[17] pROC_1.8            mlbench_2.1-1       caret_6.0-70        ggplot2_2.1.0      
[21] lattice_0.20-33    

loaded via a namespace (and not attached):
 [1] compiler_3.3.0     nloptr_1.0.4       bitops_1.0-6       iterators_1.0.8    tools_3.3.0       
 [6] lme4_1.1-12        digest_0.6.9       nlme_3.1-127       gtable_0.2.0       mgcv_1.8-12       
[11] SparseM_1.7        gridExtra_2.2.1    stringr_1.0.0      MatrixModels_0.4-1 stats4_3.3.0      
[16] grid_3.3.0         nnet_7.3-12        data.table_1.9.6   pbapply_1.2-1      minqa_1.2.4       
[21] reshape2_1.4.1     car_2.1-2          magrittr_1.5       scales_0.4.0       codetools_0.2-14  
[26] MASS_7.3-45        pbkrtest_0.4-6     colorspace_1.2-6   quantreg_5.26      stringi_1.1.1     
[31] munsell_0.4.3      chron_2.3-47    
zachmayer commented 7 years ago

I can't debug the error unless I can reproduce it, unfortunately.

jrowen commented 7 years ago

I'm able to reproduce this using the jrowen/dcaret docker image and the sample code above. This might allow you to reproduce locally.

topepo commented 7 years ago

I'm not sure if it is the same issue, but I have a similar error:

library(caret)
library(caretEnsemble)

set.seed(475)
train_data <- twoClassSim(500)
test_data <- twoClassSim(1000)

en_ctrl <- trainControl(method = "repeatedcv", repeats = 5,
                        savePredictions = "final",
                        classProbs = TRUE,
                        index = createMultiFolds(train_data$Class, times = 5),
                        summaryFunction = twoClassSummary)

en_mods <- list(lda = caretModelSpec(method="lda"),
                svm = caretModelSpec(method="svmRadial", 
                                     tuneLength = 8, 
                                     preProcess=c("center", "scale")),
                c5rules=caretModelSpec(method="C5.0Rules"))

mod_list <- caretList(Class ~ ., data=train_data,
                      trControl = en_ctrl,
                      metric="ROC",
                      tuneList = en_mods)

stack_ctrl <- trainControl(method="cv",
                           savePredictions="final",
                           classProbs=TRUE,
                           summaryFunction=twoClassSummary)

set.seed(234)
glm_stack <- caretStack(mod_list,
                        method="glm",
                        metric="ROC",
                        trControl= stack_ctrl)
glm_stack
predict(glm_stack, newdata = test_data, type = "prob")

ends with

 Error in eval(expr, envir, enclos) : object 'svm' not found 
> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.5 (El Capitan)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] C50_0.1.0-24        kernlab_0.9-24      MASS_7.3-45        
[4] caretEnsemble_2.0.0 caret_6.0-71        ggplot2_2.1.0      
[7] lattice_0.20-33    

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.5        compiler_3.3.1     nloptr_1.0.4      
 [4] plyr_1.8.4         iterators_1.0.8    tools_3.3.1       
 [7] partykit_1.0-5     lme4_1.1-12        digest_0.6.9      
[10] nlme_3.1-128       gtable_0.2.0       mgcv_1.8-12       
[13] Matrix_1.2-6       foreach_1.4.3      parallel_3.3.1    
[16] SparseM_1.7        gridExtra_2.2.1    stringr_1.0.0     
[19] pROC_1.8           MatrixModels_0.4-1 stats4_3.3.1      
[22] grid_3.3.1         nnet_7.3-12        data.table_1.9.6  
[25] pbapply_1.2-1      survival_2.39-4    minqa_1.2.4       
[28] reshape2_1.4.1     car_2.1-2          magrittr_1.5      
[31] scales_0.4.0       codetools_0.2-14   splines_3.3.1     
[34] pbkrtest_0.4-6     colorspace_1.2-6   quantreg_5.26     
[37] stringi_1.1.1      munsell_0.4.3      chron_2.3-47      
sparcycram commented 7 years ago

I am also getting a similar error. Its new and is in predict(greedy_Ensemble,..) where greedy_ensemble <- caretEnsemble(.......

Error in eval(expr, envir, enclos) : object 'lda' not found

Strange as its has not happened yet on my workstation only on my new laptop?

R version 3.3.1 (2016-06-21) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1

locale: [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252

attached base packages: [1] parallel splines stats graphics grDevices utils datasets methods base

other attached packages: [1] doParallel_1.0.10 iterators_1.0.8 kohonen_2.0.19 C50_0.1.0-24 arm_1.9-1 lme4_1.1-12 glmnet_2.0-5 foreach_1.4.3
[9] Matrix_1.2-7.1 earth_4.4.6 plotmo_3.2.0 TeachingDemos_2.10 plotrix_3.6-3 mda_0.4-9 pamr_1.55 cluster_2.0.4
[17] deepnet_0.2 stepPlr_0.92 xgboost_0.4-4 klaR_0.6-12 MASS_7.3-45 caTools_1.17.1 ada_2.0-5 class_7.3-14
[25] plyr_1.8.4 e1071_1.6-7 kernlab_0.9-24 gbm_2.1.1 survival_2.39-5 nnet_7.3-12 randomForest_4.6-12 rpart_4.1-10
[33] pROC_1.8 mlbench_2.1-1 ipred_0.9-5 caretEnsemble_2.0.0 caret_6.0-71 ggplot2_2.1.0 lattice_0.20-34

loaded via a namespace (and not attached): [1] Rcpp_0.12.7 digest_0.6.10 R6_2.1.3 chron_2.3-47 MatrixModels_0.4-1 stats4_3.3.1 coda_0.18-1 httr_1.2.1
[9] curl_2.1 minqa_1.2.4 data.table_1.9.6 SparseM_1.72 car_2.1-3 nloptr_1.0.4 partykit_1.1-1 combinat_0.0-8
[17] devtools_1.12.0 stringr_1.1.0 munsell_0.4.3 compiler_3.3.1 mgcv_1.8-15 gridExtra_2.2.1 prodlim_1.5.7 codetools_0.2-14
[25] withr_1.0.2 bitops_1.0-6 grid_3.3.1 nlme_3.1-128 gtable_0.2.0 git2r_0.15.0 magrittr_1.5 scales_0.4.0
[33] stringi_1.1.1 pbapply_1.3-0 reshape2_1.4.1 Formula_1.2-1 lava_1.4.4 tools_3.3.1 abind_1.4-5 pbkrtest_0.4-6
[41] colorspace_1.2-6 memoise_1.0.0 quantreg_5.29

sparcycram commented 7 years ago

I rebooted my laptop and started a new session and the code ran perfectly no errors!

If you restart R Studio no joy. I had to reboot the laptop! Go figure?

hadjipantelis commented 7 years ago

Hello Zach,

First of all thank you for making caretEnsemble. Unfortunately, I have the same problem with predict. It is unusable for any model that is defined using tuneList. For example:

library(caretEnsemble)
set.seed(42)
models <- caretList(iris[1:50,1:2], iris[1:50,3],  methodList=c("glm"),
                    tuneList=list( rf1=caretModelSpec(method="rf",
                                      tuneGrid=data.frame(.mtry=2)),
                                   rf2=caretModelSpec(method="rf", 
                                      tuneGrid=data.frame(.mtry=1), preProcess="pca")))
ens <- caretEnsemble(models) 
predict(ens, newdata = iris[1:10,1:2]) # Fails to find rf1.
# Error in eval(expr, envir, enclos) : object 'rf1' not found 

errs. Interestingly summary seems to be fine:

summary(ens)
The following models were ensembled: rf1, rf2, glm 
They were weighted: 
1.0088 0.4529 -0.312 0.1769
The resulting RMSE is: 0.1687
The fit for each individual model on the RMSE is: 
 method      RMSE     RMSESD
    rf1 0.1835060 0.02469139
    rf2 0.1889357 0.02784562
    glm 0.1747355 0.02966760

Simply using:

models <- caretList(iris[1:50,1:2], iris[1:50,3],  methodList=c("glm", "lm"))
ens <- caretEnsemble(models) # Warnings about rank-deficiency but otherwise not errors
predict(ens, newdata = iris[1:10,1:2]) # Works fine (aside the numerous warnings)

produces no errors. So I suspect something odd is going on during model look-up. For reference everything is updated to the latest CRAN stable.

sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] caret_6.0-71        ggplot2_2.1.0       lattice_0.20-33    
[4] randomForest_4.6-12 caretEnsemble_2.0.0

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.5        magrittr_1.5       splines_3.3.1      MASS_7.3-45       
 [5] munsell_0.4.3      colorspace_1.2-6   foreach_1.4.3      pbapply_1.3-0     
 [9] minqa_1.2.4        stringr_1.0.0      car_2.1-2          plyr_1.8.4        
[13] tools_3.3.1        nnet_7.3-12        pbkrtest_0.4-6     parallel_3.3.1    
[17] grid_3.3.1         data.table_1.9.6   gtable_0.2.0       nlme_3.1-128      
[21] mgcv_1.8-12        quantreg_5.26      MatrixModels_0.4-1 iterators_1.0.8   
[25] digest_0.6.9       lme4_1.1-12        Matrix_1.2-6       gridExtra_2.2.1   
[29] nloptr_1.0.4       reshape2_1.4.1     codetools_0.2-14   stringi_1.1.1     
[33] compiler_3.3.1     scales_0.4.0       stats4_3.3.1       SparseM_1.7       
[37] chron_2.3-47 

I can reproduce this behaviour on a R 3.2.0 with Scientific Linux too:

sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-unknown-linux-gnu (64-bit)
Running under: Scientific Linux release 6.8 (Carbon)Santiago

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] caret_6.0-71        ggplot2_2.1.0       lattice_0.20-31    
[4] randomForest_4.6-10 caretEnsemble_2.0.0

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.5      magrittr_1.5     splines_3.2.0    MASS_7.3-40     
 [5] munsell_0.4.2    colorspace_1.2-6 foreach_1.4.3    pbapply_1.3-0   
 [9] minqa_1.2.4      stringr_1.0.0    car_2.0-25       plyr_1.8.4      
[13] tools_3.2.0      nnet_7.3-9       pbkrtest_0.4-2   parallel_3.2.0  
[17] grid_3.2.0       data.table_1.9.6 gtable_0.1.2     nlme_3.1-120    
[21] mgcv_1.8-6       quantreg_5.11    iterators_1.0.7  digest_0.6.8    
[25] lme4_1.1-9       Matrix_1.2-0     gridExtra_2.2.1  nloptr_1.0.4    
[29] reshape2_1.4.1   codetools_0.2-11 stringi_1.1.1    compiler_3.2.0  
[33] scales_0.4.0     stats4_3.2.0     SparseM_1.6      chron_2.3-47

I down-graded caret to the 6.0-70 version and the issue remained, I tried down-grading to 6.0-68 and the issue still remained so I suspect it is not something introduced with caret's recent updates but I might be wrong...

hadjipantelis commented 7 years ago

Hello,

Do we have an update on this? I was following this up and I noticed something very odd. The example I attach will execute fine on Windows 7 but not on CentOS 7. I append the relevant sessionInfo() outputs. Are you able to reproduce it on Linux?

The Windows machine:

sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252    LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C                            LC_TIME=English_United Kingdom.1252    

attached base packages:
  [1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
  [1] caret_6.0-73        ggplot2_2.1.0       lattice_0.20-33     randomForest_4.6-12 caretEnsemble_2.0.0

loaded via a namespace (and not attached):
  [1] Rcpp_0.12.6        magrittr_1.5       splines_3.3.1      MASS_7.3-45        munsell_0.4.3      colorspace_1.2-6  
[7] foreach_1.4.3      pbapply_1.3-1      minqa_1.2.4        stringr_1.0.0      car_2.1-3          plyr_1.8.4        
[13] tools_3.3.1        nnet_7.3-12        pbkrtest_0.4-6     parallel_3.3.1     grid_3.3.1         data.table_1.9.6  
[19] gtable_0.2.0       nlme_3.1-128       mgcv_1.8-12        quantreg_5.26      MatrixModels_0.4-1 iterators_1.0.8   
[25] digest_0.6.10      lme4_1.1-12        Matrix_1.2-6       gridExtra_2.2.1    nloptr_1.0.4       reshape2_1.4.1    
[31] ModelMetrics_1.1.0 codetools_0.2-14   stringi_1.1.1      compiler_3.3.1     scales_0.4.0       stats4_3.3.1      
[37] SparseM_1.7        chron_2.3-47       

The Linux machine:

R version 3.3.1 (2016-06-21)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

locale:
  [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
[3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
[5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
[7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
[9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
  [1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] caret_6.0-73        ggplot2_2.1.0       lattice_0.20-33    
[4] randomForest_4.6-12 caretEnsemble_2.0.0

loaded via a namespace (and not attached):
  [1] Rcpp_0.12.7        magrittr_1.5       splines_3.3.1      MASS_7.3-45       
[5] munsell_0.4.3      colorspace_1.2-6   foreach_1.4.3      pbapply_1.3-0     
[9] minqa_1.2.4        stringr_1.1.0      car_2.1-3          plyr_1.8.4        
[13] tools_3.3.1        nnet_7.3-12        pbkrtest_0.4-6     parallel_3.3.1    
[17] grid_3.3.1         data.table_1.9.6   gtable_0.2.0       nlme_3.1-128      
[21] mgcv_1.8-12        quantreg_5.29      MatrixModels_0.4-1 iterators_1.0.8   
[25] digest_0.6.10      lme4_1.1-12        Matrix_1.2-6       gridExtra_2.2.1   
[29] nloptr_1.0.4       reshape2_1.4.1     ModelMetrics_1.1.0 codetools_0.2-14  
[33] stringi_1.1.1      compiler_3.3.1     scales_0.4.0       stats4_3.3.1      
[37] SparseM_1.72       chron_2.3-47   
carloscinelli commented 7 years ago

Hi, Zach, do you have any updates on this bug?

hadjipantelis commented 7 years ago

Great news! Seems like the latest commit by @washcycle fixed this! Both the example I linked and the one from @topepo seem to work fine! @washcycle thank you very much for this! :D

My sesssionInfo() is:

> sessionInfo()
R version 3.3.3 (2017-03-06)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] C50_0.1.0-24        kernlab_0.9-25      MASS_7.3-47         caret_6.0-76       
[5] ggplot2_2.2.1       lattice_0.20-35     randomForest_4.6-12 caretEnsemble_2.0.0

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.10       compiler_3.3.3     nloptr_1.0.4       git2r_0.18.0      
 [5] plyr_1.8.4         iterators_1.0.8    tools_3.3.3        partykit_1.1-1    
 [9] digest_0.6.12      lme4_1.1-13        memoise_1.1.0      tibble_1.3.0      
[13] nlme_3.1-131       gtable_0.2.0       mgcv_1.8-17        Matrix_1.2-10     
[17] foreach_1.4.3      curl_2.6           parallel_3.3.3     SparseM_1.77      
[21] gridExtra_2.2.1    withr_1.0.2        httr_1.2.1         stringr_1.2.0     
[25] knitr_1.15.1       MatrixModels_0.4-1 devtools_1.13.0    stats4_3.3.3      
[29] grid_3.3.3         nnet_7.3-12        data.table_1.10.4  R6_2.2.1          
[33] pbapply_1.3-2      survival_2.41-3    Formula_1.2-1      minqa_1.2.4       
[37] reshape2_1.4.2     car_2.1-4          magrittr_1.5       scales_0.4.1      
[41] codetools_0.2-15   ModelMetrics_1.1.0 splines_3.3.3      pbkrtest_0.4-7    
[45] colorspace_1.3-2   quantreg_5.33      stringi_1.1.5      lazyeval_0.2.0    
[49] munsell_0.4.3