topepo / caret

caret (Classification And Regression Training) R package that contains misc functions for training and plotting classification and regression models
http://topepo.github.io/caret/index.html
1.61k stars 632 forks source link

svmPoly Error in classification #769

Closed elpidiofilho closed 6 years ago

elpidiofilho commented 6 years ago

The caret (github version 6.0-77) displays an error message when I try to fit a svmPoly model .

Something is wrong; all the Accuracy metric values are missing: Accuracy Kappa
Min. : NA Min. : NA
1st Qu.: NA 1st Qu.: NA
Median : NA Median : NA
Mean :NaN Mean :NaN
3rd Qu.: NA 3rd Qu.: NA
Max. : NA Max. : NA
NA's :27 NA's :27
Error: Stopping In addition: There were 50 or more warnings (use warnings() to see the first 50)

I try with code in caret test link: https://github.com/topepo/caret/blob/master/RegressionTests/Code/svmPoly.R and I give the same error message.

Code :

library(caret)
library(plyr)
library(recipes)
library(dplyr)

model <- "svmPoly"

for(i in getModelInfo(model)[[1]]$library) {
  do.call("loadNamespace", list(package = i))
}

set.seed(2)
training <- twoClassSim(50, linearVars = 2)
testing <- twoClassSim(500, linearVars = 2)
trainX <- training[, -ncol(training)]
trainY <- training$Class

cctrl1 <- trainControl(method = "cv", number = 3, returnResamp = "all")

set.seed(849)
test_class_cv_model <- train(trainX, trainY, 
                             method = "svmPoly", 
                             trControl = cctrl1,
                             preProc = c("center", "scale"))

sessionInfo() sessionInfo() R version 3.4.2 (2017-09-28) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows >= 8 x64 (build 9200)
Matrix products: default locale: [1] LC_COLLATE=Portuguese_Brazil.1252
LC_CTYPE=Portuguese_Brazil.1252
LC_MONETARY=Portuguese_Brazil.1252 LC_NUMERIC=C
LC_TIME=Portuguese_Brazil.1252
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] recipes_0.1.0.9000 broom_0.4.2 dplyr_0.7.4
plyr_1.8.4 caret_6.0-77.9000 ggplot2_2.2.1 lattice_0.20-35

topepo commented 6 years ago

Take a look at the warnings:

kernlab class prediction calculations failed; returning NAs

SVMs don't naturally include probability estimates so there is a secondary model (that is basically a logistic model) that is fit. This extra model is suppose to translate the SVM output to probabilities and here it failed. This happens periodically and there isn't much that anything outside of kernab can do to fix it.

Out of curiosity, why run loadNamespace?

elpidiofilho commented 6 years ago

Max the above code works correctly if I add in my source code the call to the kernlab library. It seems to me that the caret is not loading the kernlab library before running the svmPoly classifier.

This code run ok with call to kernlab library in my code, but if i remove this line I get a error.

library(caret)
library(kernlab)

model <- "svmPoly"

set.seed(2)
training <- twoClassSim(50, linearVars = 2)
trainX <- training[, -ncol(training)]
trainY <- training$Class

cctrl1 <- trainControl(method = "cv", number = 3, returnResamp = "all")

set.seed(849)
test_class_cv_model <- train(trainX, trainY, 
                             method = "svmPoly", 
                             trControl = cctrl1,
                             preProc = c("center", "scale"))
topepo commented 6 years ago

caret is not loading the kernlab library before running the svmPoly classifier.

It used to load the library and that was fairly bad form. As of the last version, it loads the namespace instead and this avoids name collisions. You should not have to do that with the current devel or CRAN versions of caret. I did not use it to get your code to run.

Keep in mind that the logistic model that I mentioned uses random numbers and, even by setting the seed, you might not be able to get reproducible failures or successes from ksvm (at least that was the case the last time I answered this question)

topepo commented 6 years ago

This should be fine with the current devel. I fixed these issues today but it would be good to have someone do an external test on their system.

elpidiofilho commented 6 years ago

Max, I installed the current devel and the errors that I had reported (lasso and svmPoly) disappeared. I decided to run all the regression methods of the caret package to the dataset airquality. Below is a summary of the results obtained.

dataset : airquality

Original : 128 models regression

Removed slow models
[1] "ANFIS"        "bartMachine"  "DENFIS"       "earth"        "FIR.DM"       "FS.HGD"       "GFS.FR.MOGUL" "GFS.LT.RS"   
[9] "GFS.THRIFT"   "HYFIS"      

[1] "failed models" 45
 [1] "bag"                 "BstLm"               "dnn"                 "gaussprRadial"       "gbm_h2o"            
 [6] "glm.nb"              "glmnet_h2o"          "logicBag"            "logreg"              "mlpKerasDecay"      
[11] "mlpKerasDropout"     "mlpSGD"              "msaenet"             "mxnet"               "mxnetAdam"          
[16] "nnls"                "null"                "ordinalNet"          "parRF"               "penalized"          
[21] "plsRglm"             "pythonKnnReg"        "qrf"                 "randomGLM"           "ranger"             
[26] "rbf"                 "Rborist"             "rfRules"             "rqlasso"             "rqnc"               
[31] "RRF"                 "RRFglobal"           "SBC"                 "spikeslab"           "svmBoundrangeString"
[36] "svmExpoString"       "svmLinear2"          "svmLinear3"          "svmPoly"             "svmSpectrumString"  
[41] "treebag"             "xgbDART"             "xgbLinear"           "xgbTree"             "xyf"     

Sucessfull run models = 73  

Models that depends of library that isn't in CRAN
msaenet, mxnetAdam --> mxnet
pythonKnnReg --> rPython
topepo commented 6 years ago

Some of those are not included in the package because there are issues with the code (and the people who added them aren't responding, such as pythonKnnReg). If you looked at what is in the models directory, there are some that shouldn't be included for testing. Use the models that are available via getModelInfo.

Also, there are some in there that make no sense to run for that data set (eg. svmExpoString) and others that depend on external libraries (e.g. keras, h2o). Were those installed and verified to be working?

Also, I ran all of the regression tests yesterday after a big commit and they worked for those test cases.

For the svm models, let's start by getting reprexs (= {repr}oducible {ex}ample) for these test cases so that I can try to reproduce them. Also, I suggest using the sessioninfo package to get version information since that gives a lot more detail (but that's not required).

As an aside, I've thought about removing all of the frbs models because they are very slow and tend to consistently fail the basic regression tests.

elpidiofilho commented 6 years ago

for bag model


suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(caret))
data("airquality")
d = airquality %>% na.omit() %>% select(-Month,-Day) %>% data.frame()

set.seed(313)
va = caret::createDataPartition(d[,1], p = 0.75, list = F)
train = d[va,]
test  = d[-va,]

resample_ = 'cv' 
nfolds = 5; 
regressor = 'bag'
caret::getModelInfo(regressor, regex = F)[[1]]$type
#> [1] "Regression"     "Classification"
tc <- trainControl( method = resample_,  number = nfolds)

fit1 = caret::train(x = train[,-1],  y = train[,1], method = regressor,  metric = 'Rsquared', trControl = tc)
#> Warning: model fit failed for Fold1: vars=3 Error in bag.default(x, y, vars = param$vars, ...) : 
#>   Please specify 'bagControl' with the appropriate functions
#> Warning: model fit failed for Fold2: vars=3 Error in bag.default(x, y, vars = param$vars, ...) : 
#>   Please specify 'bagControl' with the appropriate functions
#> Warning: model fit failed for Fold3: vars=3 Error in bag.default(x, y, vars = param$vars, ...) : 
#>   Please specify 'bagControl' with the appropriate functions
#> Warning: model fit failed for Fold4: vars=3 Error in bag.default(x, y, vars = param$vars, ...) : 
#>   Please specify 'bagControl' with the appropriate functions
#> Warning: model fit failed for Fold5: vars=3 Error in bag.default(x, y, vars = param$vars, ...) : 
#>   Please specify 'bagControl' with the appropriate functions
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info =
#> trainInfo, : There were missing values in resampled performance measures.
#> Something is wrong; all the Rsquared metric values are missing:
#>       RMSE        Rsquared        MAE     
#>  Min.   : NA   Min.   : NA   Min.   : NA  
#>  1st Qu.: NA   1st Qu.: NA   1st Qu.: NA  
#>  Median : NA   Median : NA   Median : NA  
#>  Mean   :NaN   Mean   :NaN   Mean   :NaN  
#>  3rd Qu.: NA   3rd Qu.: NA   3rd Qu.: NA  
#>  Max.   : NA   Max.   : NA   Max.   : NA  
#>  NA's   :1     NA's   :1     NA's   :1
#> Error: Stopping
warnings()
#> NULL
fit1
#> Error in eval(expr, envir, enclos): objeto 'fit1' não encontrado

if (is.null(fit1) == FALSE) {
  v = predict(fit1, test[,-1])
  plot(v, test$Ozone)
  abline(0,1)
  caret::postResample((unlist(v)), test$Ozone)
}                    
#> Error in eval(expr, envir, enclos): objeto 'fit1' não encontrado

sessionInfo()
#> R version 3.4.2 (2017-09-28)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 15063)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=Portuguese_Brazil.1252  LC_CTYPE=Portuguese_Brazil.1252   
#> [3] LC_MONETARY=Portuguese_Brazil.1252 LC_NUMERIC=C                      
#> [5] LC_TIME=Portuguese_Brazil.1252    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] caret_6.0-77.9000 ggplot2_2.2.1     lattice_0.20-35   dplyr_0.7.4      
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_0.12.13       lubridate_1.7.1    tidyr_0.7.2       
#>  [4] class_7.3-14       assertthat_0.2.0   rprojroot_1.2     
#>  [7] digest_0.6.12      ipred_0.9-6        psych_1.7.8       
#> [10] foreach_1.4.3      R6_2.2.2           plyr_1.8.4        
#> [13] backports_1.1.1    stats4_3.4.2       evaluate_0.10.1   
#> [16] rlang_0.1.4        lazyeval_0.2.1     kernlab_0.9-25    
#> [19] rpart_4.1-11       Matrix_1.2-11      rmarkdown_1.6     
#> [22] splines_3.4.2      CVST_0.2-1         ddalpha_1.3.1     
#> [25] gower_0.1.2        stringr_1.2.0      foreign_0.8-69    
#> [28] munsell_0.4.3      broom_0.4.2        compiler_3.4.2    
#> [31] pkgconfig_2.0.1    mnormt_1.5-5       dimRed_0.1.0      
#> [34] htmltools_0.3.6    nnet_7.3-12        tidyselect_0.2.3  
#> [37] tibble_1.3.4       prodlim_1.6.1      DRR_0.0.2         
#> [40] codetools_0.2-15   RcppRoll_0.2.3     withr_2.1.0       
#> [43] MASS_7.3-47        recipes_0.1.0.9000 ModelMetrics_1.1.0
#> [46] grid_3.4.2         nlme_3.1-131       gtable_0.2.0      
#> [49] magrittr_1.5       scales_0.5.0       stringi_1.1.5     
#> [52] reshape2_1.4.2     bindrcpp_0.2       timeDate_3012.100 
#> [55] robustbase_0.92-8  lava_1.5.1         iterators_1.0.8   
#> [58] tools_3.4.2        glue_1.2.0         DEoptimR_1.0-8    
#> [61] purrr_0.2.4        sfsmisc_1.1-1      parallel_3.4.2    
#> [64] survival_2.41-3    yaml_2.1.14        colorspace_1.3-2  
#> [67] knitr_1.17         bindr_0.1
elpidiofilho commented 6 years ago

for BstLm model

Sys.setenv(LANG="EN") 
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(caret))
data("airquality")
d = airquality %>% na.omit() %>% select(-Month,-Day) %>% data.frame()

set.seed(313)
va = caret::createDataPartition(d[,1], p = 0.75, list = F)
train = d[va,]
test  = d[-va,]

resample_ = 'cv' 
nfolds = 5; 
regressor = "BstLm"
caret::getModelInfo(regressor, regex = F)[[1]]$type
#> [1] "Regression"     "Classification"
tc <- trainControl( method = resample_,  number = nfolds)

fit1 = caret::train(x = train[,-1],  y = train[,1], method = regressor,  metric = 'Rsquared', trControl = tc)
#> Error in .(nu): could not find function "."
warnings()
#> NULL
fit1
#> Error in eval(expr, envir, enclos): object 'fit1' not found

if (is.null(fit1) == FALSE) {
  v = predict(fit1, test[,-1])
  plot(v, test$Ozone)
  abline(0,1)
  caret::postResample((unlist(v)), test$Ozone)
}                    
#> Error in eval(expr, envir, enclos): object 'fit1' not found

sessionInfo()
#> R version 3.4.2 (2017-09-28)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 15063)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=Portuguese_Brazil.1252  LC_CTYPE=Portuguese_Brazil.1252   
#> [3] LC_MONETARY=Portuguese_Brazil.1252 LC_NUMERIC=C                      
#> [5] LC_TIME=Portuguese_Brazil.1252    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] caret_6.0-77.9000 ggplot2_2.2.1     lattice_0.20-35   dplyr_0.7.4      
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_0.12.13       lubridate_1.7.1    tidyr_0.7.2       
#>  [4] class_7.3-14       assertthat_0.2.0   rprojroot_1.2     
#>  [7] digest_0.6.12      ipred_0.9-6        psych_1.7.8       
#> [10] foreach_1.4.3      R6_2.2.2           plyr_1.8.4        
#> [13] backports_1.1.1    stats4_3.4.2       evaluate_0.10.1   
#> [16] rlang_0.1.4        lazyeval_0.2.1     kernlab_0.9-25    
#> [19] rpart_4.1-11       Matrix_1.2-11      rmarkdown_1.6     
#> [22] splines_3.4.2      CVST_0.2-1         ddalpha_1.3.1     
#> [25] gower_0.1.2        stringr_1.2.0      foreign_0.8-69    
#> [28] munsell_0.4.3      broom_0.4.2        compiler_3.4.2    
#> [31] pkgconfig_2.0.1    mnormt_1.5-5       dimRed_0.1.0      
#> [34] gbm_2.1.3          htmltools_0.3.6    nnet_7.3-12       
#> [37] tidyselect_0.2.3   tibble_1.3.4       prodlim_1.6.1     
#> [40] DRR_0.0.2          codetools_0.2-15   RcppRoll_0.2.3    
#> [43] withr_2.1.0        MASS_7.3-47        recipes_0.1.0.9000
#> [46] ModelMetrics_1.1.0 grid_3.4.2         nlme_3.1-131      
#> [49] gtable_0.2.0       magrittr_1.5       scales_0.5.0      
#> [52] stringi_1.1.5      reshape2_1.4.2     doParallel_1.0.11 
#> [55] bst_0.3-14         bindrcpp_0.2       timeDate_3012.100 
#> [58] robustbase_0.92-8  lava_1.5.1         iterators_1.0.8   
#> [61] tools_3.4.2        glue_1.2.0         DEoptimR_1.0-8    
#> [64] purrr_0.2.4        sfsmisc_1.1-1      parallel_3.4.2    
#> [67] survival_2.41-3    yaml_2.1.14        colorspace_1.3-2  
#> [70] knitr_1.17         bindr_0.1
elpidiofilho commented 6 years ago

for gaussprRadial model

Sys.setenv(LANG="EN") 
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(caret))
data("airquality")
d = airquality %>% na.omit() %>% select(-Month,-Day) %>% data.frame()

set.seed(313)
va = caret::createDataPartition(d[,1], p = 0.75, list = F)
train = d[va,]
test  = d[-va,]

resample_ = 'cv' 
nfolds = 5; 
regressor = "gaussprRadial"
caret::getModelInfo(regressor, regex = F)[[1]]$type
#> [1] "Regression"     "Classification"
tc <- trainControl( method = resample_,  number = nfolds)

fit1 = caret::train(x = train[,-1],  y = train[,1], method = regressor,  metric = 'Rsquared', trControl = tc)
#> Warning: predictions failed for Fold1: sigma=0.3993 Error in UseMethod("predict") : 
#>   no applicable method for 'predict' applied to an object of class "c('gausspr', 'vm')"
#> Warning: predictions failed for Fold2: sigma=0.3993 Error in UseMethod("predict") : 
#>   no applicable method for 'predict' applied to an object of class "c('gausspr', 'vm')"
#> Warning: predictions failed for Fold3: sigma=0.3993 Error in UseMethod("predict") : 
#>   no applicable method for 'predict' applied to an object of class "c('gausspr', 'vm')"
#> Warning: predictions failed for Fold4: sigma=0.3993 Error in UseMethod("predict") : 
#>   no applicable method for 'predict' applied to an object of class "c('gausspr', 'vm')"
#> Warning: predictions failed for Fold5: sigma=0.3993 Error in UseMethod("predict") : 
#>   no applicable method for 'predict' applied to an object of class "c('gausspr', 'vm')"
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info =
#> trainInfo, : There were missing values in resampled performance measures.
#> Something is wrong; all the Rsquared metric values are missing:
#>       RMSE        Rsquared        MAE     
#>  Min.   : NA   Min.   : NA   Min.   : NA  
#>  1st Qu.: NA   1st Qu.: NA   1st Qu.: NA  
#>  Median : NA   Median : NA   Median : NA  
#>  Mean   :NaN   Mean   :NaN   Mean   :NaN  
#>  3rd Qu.: NA   3rd Qu.: NA   3rd Qu.: NA  
#>  Max.   : NA   Max.   : NA   Max.   : NA  
#>  NA's   :1     NA's   :1     NA's   :1
#> Error: Stopping
warnings()
#> NULL
fit1
#> Error in eval(expr, envir, enclos): object 'fit1' not found

if (is.null(fit1) == FALSE) {
  v = predict(fit1, test[,-1])
  plot(v, test$Ozone)
  abline(0,1)
  caret::postResample((unlist(v)), test$Ozone)
}                    
#> Error in eval(expr, envir, enclos): object 'fit1' not found

sessionInfo()
#> R version 3.4.2 (2017-09-28)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 15063)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=Portuguese_Brazil.1252  LC_CTYPE=Portuguese_Brazil.1252   
#> [3] LC_MONETARY=Portuguese_Brazil.1252 LC_NUMERIC=C                      
#> [5] LC_TIME=Portuguese_Brazil.1252    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] caret_6.0-77.9000 ggplot2_2.2.1     lattice_0.20-35   dplyr_0.7.4      
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_0.12.13       lubridate_1.7.1    tidyr_0.7.2       
#>  [4] class_7.3-14       assertthat_0.2.0   rprojroot_1.2     
#>  [7] digest_0.6.12      ipred_0.9-6        psych_1.7.8       
#> [10] foreach_1.4.3      R6_2.2.2           plyr_1.8.4        
#> [13] backports_1.1.1    stats4_3.4.2       evaluate_0.10.1   
#> [16] rlang_0.1.4        lazyeval_0.2.1     kernlab_0.9-25    
#> [19] rpart_4.1-11       Matrix_1.2-11      rmarkdown_1.6     
#> [22] splines_3.4.2      CVST_0.2-1         ddalpha_1.3.1     
#> [25] gower_0.1.2        stringr_1.2.0      foreign_0.8-69    
#> [28] munsell_0.4.3      broom_0.4.2        compiler_3.4.2    
#> [31] pkgconfig_2.0.1    mnormt_1.5-5       dimRed_0.1.0      
#> [34] htmltools_0.3.6    nnet_7.3-12        tidyselect_0.2.3  
#> [37] tibble_1.3.4       prodlim_1.6.1      DRR_0.0.2         
#> [40] codetools_0.2-15   RcppRoll_0.2.3     withr_2.1.0       
#> [43] MASS_7.3-47        recipes_0.1.0.9000 ModelMetrics_1.1.0
#> [46] grid_3.4.2         nlme_3.1-131       gtable_0.2.0      
#> [49] magrittr_1.5       scales_0.5.0       stringi_1.1.5     
#> [52] reshape2_1.4.2     bindrcpp_0.2       timeDate_3012.100 
#> [55] robustbase_0.92-8  lava_1.5.1         iterators_1.0.8   
#> [58] tools_3.4.2        glue_1.2.0         DEoptimR_1.0-8    
#> [61] purrr_0.2.4        sfsmisc_1.1-1      parallel_3.4.2    
#> [64] survival_2.41-3    yaml_2.1.14        colorspace_1.3-2  
#> [67] knitr_1.17         bindr_0.1
elpidiofilho commented 6 years ago

for glm.nb model

Sys.setenv(LANG="EN") 
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(caret))
data("airquality")
d = airquality %>% na.omit() %>% select(-Month,-Day) %>% data.frame()

set.seed(313)
va = caret::createDataPartition(d[,1], p = 0.75, list = F)
train = d[va,]
test  = d[-va,]

resample_ = 'cv' 
nfolds = 5; 
regressor = "glm.nb"
caret::getModelInfo(regressor, regex = F)[[1]]$type
#> [1] "Regression"
tc <- trainControl( method = resample_,  number = nfolds)

fit1 = caret::train(x = train[,-1],  y = train[,1], method = regressor,  metric = 'Rsquared', trControl = tc)
#> Warning: model fit failed for Fold1: link=log Error in glm.nb(formula = .outcome ~ ., data = structure(list(Solar.R = c(190L,  : 
#>   could not find function "glm.nb"
#> Warning: model fit failed for Fold1: link=sqrt Error in glm.nb(formula = .outcome ~ ., data = structure(list(Solar.R = c(190L,  : 
#>   could not find function "glm.nb"
#> Warning: model fit failed for Fold1: link=identity Error in glm.nb(formula = .outcome ~ ., data = structure(list(Solar.R = c(190L,  : 
#>   could not find function "glm.nb"
#> Warning: model fit failed for Fold2: link=log Error in glm.nb(formula = .outcome ~ ., data = structure(list(Solar.R = c(190L,  : 
#>   could not find function "glm.nb"
#> Warning: model fit failed for Fold2: link=sqrt Error in glm.nb(formula = .outcome ~ ., data = structure(list(Solar.R = c(190L,  : 
#>   could not find function "glm.nb"
#> Warning: model fit failed for Fold2: link=identity Error in glm.nb(formula = .outcome ~ ., data = structure(list(Solar.R = c(190L,  : 
#>   could not find function "glm.nb"
#> Warning: model fit failed for Fold3: link=log Error in glm.nb(formula = .outcome ~ ., data = structure(list(Solar.R = c(190L,  : 
#>   could not find function "glm.nb"
#> Warning: model fit failed for Fold3: link=sqrt Error in glm.nb(formula = .outcome ~ ., data = structure(list(Solar.R = c(190L,  : 
#>   could not find function "glm.nb"
#> Warning: model fit failed for Fold3: link=identity Error in glm.nb(formula = .outcome ~ ., data = structure(list(Solar.R = c(190L,  : 
#>   could not find function "glm.nb"
#> Warning: model fit failed for Fold4: link=log Error in glm.nb(formula = .outcome ~ ., data = structure(list(Solar.R = c(190L,  : 
#>   could not find function "glm.nb"
#> Warning: model fit failed for Fold4: link=sqrt Error in glm.nb(formula = .outcome ~ ., data = structure(list(Solar.R = c(190L,  : 
#>   could not find function "glm.nb"
#> Warning: model fit failed for Fold4: link=identity Error in glm.nb(formula = .outcome ~ ., data = structure(list(Solar.R = c(190L,  : 
#>   could not find function "glm.nb"
#> Warning: model fit failed for Fold5: link=log Error in glm.nb(formula = .outcome ~ ., data = structure(list(Solar.R = c(299L,  : 
#>   could not find function "glm.nb"
#> Warning: model fit failed for Fold5: link=sqrt Error in glm.nb(formula = .outcome ~ ., data = structure(list(Solar.R = c(299L,  : 
#>   could not find function "glm.nb"
#> Warning: model fit failed for Fold5: link=identity Error in glm.nb(formula = .outcome ~ ., data = structure(list(Solar.R = c(299L,  : 
#>   could not find function "glm.nb"
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info =
#> trainInfo, : There were missing values in resampled performance measures.
#> Something is wrong; all the Rsquared metric values are missing:
#>       RMSE        Rsquared        MAE     
#>  Min.   : NA   Min.   : NA   Min.   : NA  
#>  1st Qu.: NA   1st Qu.: NA   1st Qu.: NA  
#>  Median : NA   Median : NA   Median : NA  
#>  Mean   :NaN   Mean   :NaN   Mean   :NaN  
#>  3rd Qu.: NA   3rd Qu.: NA   3rd Qu.: NA  
#>  Max.   : NA   Max.   : NA   Max.   : NA  
#>  NA's   :3     NA's   :3     NA's   :3
#> Error: Stopping
warnings()
#> NULL
fit1
#> Error in eval(expr, envir, enclos): object 'fit1' not found

if (is.null(fit1) == FALSE) {
  v = predict(fit1, test[,-1])
  plot(v, test$Ozone)
  abline(0,1)
  caret::postResample((unlist(v)), test$Ozone)
}                    
#> Error in eval(expr, envir, enclos): object 'fit1' not found

sessionInfo()
#> R version 3.4.2 (2017-09-28)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 15063)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=Portuguese_Brazil.1252  LC_CTYPE=Portuguese_Brazil.1252   
#> [3] LC_MONETARY=Portuguese_Brazil.1252 LC_NUMERIC=C                      
#> [5] LC_TIME=Portuguese_Brazil.1252    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] caret_6.0-77.9000 ggplot2_2.2.1     lattice_0.20-35   dplyr_0.7.4      
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_0.12.13       lubridate_1.7.1    tidyr_0.7.2       
#>  [4] class_7.3-14       assertthat_0.2.0   rprojroot_1.2     
#>  [7] digest_0.6.12      ipred_0.9-6        psych_1.7.8       
#> [10] foreach_1.4.3      R6_2.2.2           plyr_1.8.4        
#> [13] backports_1.1.1    stats4_3.4.2       evaluate_0.10.1   
#> [16] rlang_0.1.4        lazyeval_0.2.1     kernlab_0.9-25    
#> [19] rpart_4.1-11       Matrix_1.2-11      rmarkdown_1.6     
#> [22] splines_3.4.2      CVST_0.2-1         ddalpha_1.3.1     
#> [25] gower_0.1.2        stringr_1.2.0      foreign_0.8-69    
#> [28] munsell_0.4.3      broom_0.4.2        compiler_3.4.2    
#> [31] pkgconfig_2.0.1    mnormt_1.5-5       dimRed_0.1.0      
#> [34] htmltools_0.3.6    nnet_7.3-12        tidyselect_0.2.3  
#> [37] tibble_1.3.4       prodlim_1.6.1      DRR_0.0.2         
#> [40] codetools_0.2-15   RcppRoll_0.2.3     withr_2.1.0       
#> [43] MASS_7.3-47        recipes_0.1.0.9000 ModelMetrics_1.1.0
#> [46] grid_3.4.2         nlme_3.1-131       gtable_0.2.0      
#> [49] magrittr_1.5       scales_0.5.0       stringi_1.1.5     
#> [52] reshape2_1.4.2     bindrcpp_0.2       timeDate_3012.100 
#> [55] robustbase_0.92-8  lava_1.5.1         iterators_1.0.8   
#> [58] tools_3.4.2        glue_1.2.0         DEoptimR_1.0-8    
#> [61] purrr_0.2.4        sfsmisc_1.1-1      parallel_3.4.2    
#> [64] survival_2.41-3    yaml_2.1.14        colorspace_1.3-2  
#> [67] knitr_1.17         bindr_0.1
topepo commented 6 years ago

I've updated a few models (and their tests). In some cases, the data packages loaded the library in the tests, so the error was not caught.

However, on the bagging model, please not the error message: Please specify 'bagControl' with the appropriate functions. This is not an issue with caret.

I also tested a lot of models on your list of 45 and the vast majority of them were false positives.

elpidiofilho commented 6 years ago

For null model

Sys.setenv(LANG="EN") 
suppressPackageStartupMessages(library(dplyr))
#> Warning: package 'dplyr' was built under R version 3.4.2
suppressPackageStartupMessages(library(caret))
data("airquality")
d = airquality %>% na.omit() %>% select(-Month,-Day) %>% data.frame()

set.seed(313)
va = caret::createDataPartition(d[,1], p = 0.75, list = F)
train = d[va,]
test  = d[-va,]

resample_ = 'cv' 
nfolds = 5; 
regressor = "null"
caret::getModelInfo(regressor, regex = F)[[1]]$type
#> [1] "Classification" "Regression"
tc <- trainControl( method = resample_,  number = nfolds)

fit1 = caret::train(x = train[,-1],  y = train[,1], method = regressor,  metric = 'Rsquared', trControl = tc)
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info =
#> trainInfo, : There were missing values in resampled performance measures.
#> Something is wrong; all the Rsquared metric values are missing:
#>       RMSE          Rsquared        MAE       
#>  Min.   :31.55   Min.   : NA   Min.   :25.16  
#>  1st Qu.:31.55   1st Qu.: NA   1st Qu.:25.16  
#>  Median :31.55   Median : NA   Median :25.16  
#>  Mean   :31.55   Mean   :NaN   Mean   :25.16  
#>  3rd Qu.:31.55   3rd Qu.: NA   3rd Qu.:25.16  
#>  Max.   :31.55   Max.   : NA   Max.   :25.16  
#>                  NA's   :1
#> Error: Stopping
warnings()
#> NULL
fit1
#> Error in eval(expr, envir, enclos): object 'fit1' not found

if (is.null(fit1) == FALSE) {
  v = predict(fit1, test[,-1])
  plot(v, test$Ozone)
  abline(0,1)
  caret::postResample((unlist(v)), test$Ozone)
}                    
#> Error in eval(expr, envir, enclos): object 'fit1' not found

sessionInfo()
#> R version 3.4.1 (2017-06-30)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 15063)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=Portuguese_Brazil.1252  LC_CTYPE=Portuguese_Brazil.1252   
#> [3] LC_MONETARY=Portuguese_Brazil.1252 LC_NUMERIC=C                      
#> [5] LC_TIME=Portuguese_Brazil.1252    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] caret_6.0-77.9000 ggplot2_2.2.1     lattice_0.20-35   dplyr_0.7.4      
#> 
#> loaded via a namespace (and not attached):
#>  [1] purrr_0.2.4          reshape2_1.4.2       kernlab_0.9-25      
#>  [4] splines_3.4.1        colorspace_1.3-2     stats4_3.4.1        
#>  [7] htmltools_0.3.6      yaml_2.1.14          survival_2.41-3     
#> [10] prodlim_1.6.1        rlang_0.1.4          ModelMetrics_1.1.0  
#> [13] withr_2.1.0          glue_1.2.0           bindrcpp_0.2        
#> [16] foreach_1.4.3        bindr_0.1            plyr_1.8.4          
#> [19] dimRed_0.1.0         lava_1.5.1           robustbase_0.92-8   
#> [22] stringr_1.2.0        timeDate_3012.100    munsell_0.4.3       
#> [25] gtable_0.2.0         recipes_0.1.0        codetools_0.2-15    
#> [28] evaluate_0.10.1      knitr_1.17           class_7.3-14        
#> [31] DEoptimR_1.0-8       Rcpp_0.12.13         scales_0.5.0        
#> [34] backports_1.1.1      ipred_0.9-6          CVST_0.2-1          
#> [37] digest_0.6.12        stringi_1.1.5        RcppRoll_0.2.2      
#> [40] ddalpha_1.3.1        grid_3.4.1           rprojroot_1.2       
#> [43] tools_3.4.1          magrittr_1.5         lazyeval_0.2.1      
#> [46] tibble_1.3.4         DRR_0.0.2            pkgconfig_2.0.1     
#> [49] MASS_7.3-47          Matrix_1.2-11        lubridate_1.7.1     
#> [52] gower_0.1.2          assertthat_0.2.0     rmarkdown_1.6.0.9000
#> [55] iterators_1.0.8      R6_2.2.2             rpart_4.1-11        
#> [58] sfsmisc_1.1-1        nnet_7.3-12          nlme_3.1-131        
#> [61] compiler_3.4.1
topepo commented 6 years ago

This is a false positive. For regression, it predicts the mean so there is no variation to calculate $R^2$. You should get an error here.