Two errors depending on metric: auc_(actual, predicted, ranks) | evalSummaryFunction(y, wts = weights, ctrl = trControl, lev = classLevels,

[X] Start a new R session
[X] Install the latest version of caretEnsemble: devtools::install_github("zachmayer/caretEnsemble")
[X] Install the latest version of caret: update.packages(oldPkgs="caret", ask=FALSE)
[X] Write a minimal reproducible example
[X] run sessionInfo()

Issue I'm attempting to classify a binary response variable with the code in the first reproducible example below. I would like to use ROC as the metric but get the following error when doing so: Error in evalSummaryFunction(y, wts = weights, ctrl = trControl, lev = classLevels, : train()'s use of ROC codes requires class probabilities. See the classProbs option of trainControl() When I leave metric out of the call, I assume it defaults to Accuracy as the response variable is a factor, it results in the following error: Error in auc_(actual, predicted, ranks) : Not compatible with requested type: [type=list; target=double].

In the attempt to create a minimal reproducible example with the UCI Breast Cancer dataset I'm getting an entirely different error - so I've opted to just include the data I am using that caused the errors above as they are higher priority to resolve. I've included the code with the actual data below and the UCI minrepro example at the bottom of the thread. All required packages are listed in the req.packages vector.

Any assistance is appreciated!

Exact, reproducible example with actual data:

set.seed(1)
req.packages <- c("doParallel","kernlab","caTools","C50","parallel","iterators","MASS","foreach","caret","tidyverse","dplyr","htmltools","magrittr")
for (q in seq_along(req.packages)) {
  suppressPackageStartupMessages(library(req.packages[q],character.only = T))
}
repmis::source_data("https://github.com/yogat3ch/da5030/blob/master/matchedlevels.RData?raw=true")
  data.train <- caret::createMultiFolds(matchedlevels$olddata[["Deductible"]],times = 2)
  data.train <- caret::trainControl(method="repeatedcv",
                             index=data.train, 
                             number=10,
                             repeats=1, 
                             search = "grid",
                             allowParallel = T, 
                             classProbs=T, 
                             savePredictions = "all",
                             summaryFunction = caret::twoClassSummary,
                             returnResamp = "all")
  form <- as.formula(paste0("Deductible"," ~ ."))
  cl <- makeCluster(detectCores()-1)
registerDoParallel(cl)
getDoParWorkers()
  mod.list <- caretEnsemble::caretList(formula = form,
                                       data = matchedlevels$olddata,
                                       trControl = data.train,
                                       methodList = c("svmRadial","LogitBoost","adaboost","C5.0"),
                                       tuneList = list("svmRadial"=caretEnsemble::caretModelSpec(
                                         method="svmRadial", tuneGrid = tuneGrids$svmRadial),"LogitBoost"=caretEnsemble::caretModelSpec(
                                         method="LogitBoost", tuneGrid = tuneGrids$LogitBoost),"adaboost"=caretEnsemble::caretModelSpec(
                                         method="adaboost", tuneLength = 10),"C5.0"=caretEnsemble::caretModelSpec(
                                         method="C5.0", tuneGrid = tuneGrids$C5.0)))

stopCluster(cl)
registerDoSEQ()

Session Info:

R version 3.4.3 (2017-11-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets 
[7] methods   base     

other attached packages:
 [1] caret_6.0-79      lattice_0.20-35   C50_0.1.1        
 [4] caTools_1.17.1    kernlab_0.9-25    klaR_0.6-14      
 [7] MASS_7.3-47       doParallel_1.0.11 iterators_1.0.9  
[10] foreach_1.4.4     magrittr_1.5      htmltools_0.3.6  
[13] forcats_0.3.0     stringr_1.3.0     dplyr_0.7.4      
[16] purrr_0.2.4       readr_1.1.1       tidyr_0.8.0      
[19] tibble_1.4.2      ggplot2_2.2.1     tidyverse_1.2.1  

loaded via a namespace (and not attached):
 [1] Cubist_0.2.1          colorspace_1.3-2      class_7.3-14         
 [4] rprojroot_1.3-2       rstudioapi_0.7.0-9000 DRR_0.0.3            
 [7] prodlim_1.6.1         mvtnorm_1.0-7         lubridate_1.7.3      
[10] xml2_1.2.0            R.methodsS3_1.7.1     codetools_0.2-15     
[13] splines_3.4.3         mnormt_1.5-5          robustbase_0.92-8    
[16] libcoin_1.0-1         knitr_1.20            RcppRoll_0.2.2       
[19] Formula_1.2-2         jsonlite_1.5          broom_0.4.3          
[22] ddalpha_1.3.1.1       R.oo_1.21.0           sfsmisc_1.1-1        
[25] shiny_1.0.5           compiler_3.4.3        httr_1.3.1           
[28] backports_1.1.2       assertthat_0.2.0      Matrix_1.2-12        
[31] lazyeval_0.2.1        cli_1.0.0             tools_3.4.3          
[34] bindrcpp_0.2          partykit_1.2-0        gtable_0.2.0         
[37] glue_1.2.0            reshape2_1.4.3        Rcpp_0.12.16         
[40] cellranger_1.1.0      nlme_3.1-131          repmis_0.5           
[43] psych_1.8.3.3         timeDate_3043.102     inum_1.0-0           
[46] gower_0.1.2           rvest_0.3.2           mime_0.5             
[49] miniUI_0.1.1          DEoptimR_1.0-8        scales_0.5.0         
[52] ipred_0.9-6           hms_0.4.1             RColorBrewer_1.1-2   
[55] curl_3.1              yaml_2.1.18           pbapply_1.3-4        
[58] gridExtra_2.3         rpart_4.1-11          stringi_1.1.7        
[61] highr_0.6             lava_1.6              bitops_1.0-6         
[64] rlang_0.2.0           pkgconfig_2.0.1       evaluate_0.10.1      
[67] bindr_0.1             recipes_0.1.2         CVST_0.2-1           
[70] tidyselect_0.2.4      plyr_1.8.4            R6_2.2.2             
[73] combinat_0.0-8        dimRed_0.1.0          pillar_1.1.0         
[76] haven_1.1.1           foreign_0.8-69        withr_2.1.1          
[79] RCurl_1.95-4.10       survival_2.41-3       nnet_7.3-12          
[82] modelr_0.1.1          crayon_1.3.4          questionr_0.6.2      
[85] rmarkdown_1.9         grid_3.4.3            readxl_1.0.0.9000    
[88] data.table_1.10.4-3   ModelMetrics_1.1.0    digest_0.6.15        
[91] R.cache_0.13.0        xtable_1.8-2          caretEnsemble_2.0.0  
[94] httpuv_1.3.5          R.utils_2.6.0         stats4_3.4.3         
[97] munsell_0.4.3

Minimal, reproducible example with UCI data:

The code below results in the following error: Error in names(res$trainingData) %in% as.character(form[[2]]) : argument "form" is missing, with no default Running this code requires the libraries from the above example to be loaded.

bc <- read.csv("https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/wdbc.data")
nms <- readLines("https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/wdbc.names")[110:126] %>% str_match("^\\\t?[a-z1-2]\\)\\s?(\\w+\\s?\\w+)") %>% na.omit %>% .[,2]
names(bc) <- c(nms[1:2],nms[3:12] %>% paste0(".mean"),nms[3:12] %>% paste0(".se"),nms[3:12] %>% paste0(".wst")) %>% gsub("\\s","\\.",.)
rownames(bc) <- bc[,1]
bc <- bc[,-1]
cl <- makeCluster(detectCores()-1)
registerDoParallel(cl)
getDoParWorkers()
data.train <- caret::createMultiFolds(bc$Diagnosis,times = 2)
  data.train <- caret::trainControl(method="repeatedcv",
                             index=data.train, 
                             number=10,
                             repeats=1, 
                             search = "grid",
                             allowParallel = T, 
                             classProbs=T, 
                             savePredictions = "all",
                             returnResamp = "all",
                             summaryFunction = caret::twoClassSummary
                             )
  f <- as.formula(paste0("Diagnosis"," ~ ."))
  mod.list <- caretEnsemble::caretList(formula = f,
                                       data = bc,
                                       trControl = data.train,
                                       methodList = c("svmRadial","LogitBoost","adaboost","C5.0"),
                                       metric = "ROC",
                                       tuneList = list("svmRadial"=caretEnsemble::caretModelSpec(
                                         method="svmRadial", tuneGrid = tuneGrids$svmRadial),"LogitBoost"=caretEnsemble::caretModelSpec(
                                         method="LogitBoost", tuneGrid = tuneGrids$LogitBoost),"adaboost"=caretEnsemble::caretModelSpec(
                                         method="adaboost", tuneLength = 10),"C5.0"=caretEnsemble::caretModelSpec(
                                         method="C5.0", tuneGrid = tuneGrids$C5.0)))
  stopCluster(cl)
registerDoSEQ()

zachmayer / caretEnsemble