topepo / caret

caret (Classification And Regression Training) R package that contains misc functions for training and plotting classification and regression models
http://topepo.github.io/caret/index.html
1.61k stars 634 forks source link

Why does caret produce the error "Something is wrong; all the ROC metric values are missing"? #1332

Closed Canderson156 closed 1 year ago

Canderson156 commented 1 year ago

It seems that many people have had this problem for years. There are numerous questions addressing this issue. I've tried all of the solutions they suggest and none of them have worked for me. It would be nice to know what the underlying issue is here, as the error message has not been helpful.

Something is wrong; all the ROC metric values are missing:

caret - error - Something is wrong - all the ROC metric values are missing:

error in caret ROC metric : "Something is wrong; all the ROC metric values are missing"

Using metric ROC in caret train function in R

Issue using 'ROC' metric in caret train function in R

Here is a reproducible example from my code. I had to cut down the test data, but the error seems the same. The full data set has 44 predictors instead of 8, and 1800 observations instead of 30.

test_data <- structure(list(elevation = c(4L, 4L, 146L, 146L, 146L, 146L, 146L, 146L, 146L, 146L, 146L, 146L, 146L, 146L, 146L, 146L, 146L, 146L, 146L, 146L, 146L, 146L, 204L, 291L, 291L, 291L, 291L, 413L, 413L, 413L), stdev_elevation = c(0L, 0L, 3L, 3L, 3L, 2L, 3L, 3L, 3L, 2L, 2L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 2L, 2L, 2L, 40L, 52L, 52L, 52L, 52L, 91L, 91L, 91L), d2coast = c(9L, 8L, 142L, 142L, 140L, 137L, 139L, 140L, 140L, 135L, 135L, 140L, 135L, 137L, 135L, 137L, 135L, 135L, 140L, 137L, 134L, 132L, 3L, 10L, 10L, 10L, 10L, 7L, 7L, 7L), lc_class = structure(c(12L, 12L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 9L, 9L, 9L, 9L, 9L, 2L, 2L, 2L), levels = c("Cryptogam barren complex (bedrock)", "Cryptogam, herb barren", "Erect dwarf-shrub tundra", "Graminoid, prostrate dwarf-shrub, forb tundra", "Low-shrub tundra", "Nontussock sedge, dwarf-shrub, moss tundra", "Prostrate dwarf-shrub, herb tundra", "Prostrate/Hemiprostrate dwarf-shrub tundra", "Rush/grass, forb, cryptogam tundra", "Sedge, moss, dwarf-shrub wetland", "Sedge, moss, low-shrub wetland", "Sedge/grass, moss wetland", "Tussock-sedge, dwarf-shrub, moss tundra"), class = "factor"), substrate = c(3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), elevation2 = c(18L, 18L, 21386L, 21386L, 21465L, 21360L, 21427L, 21465L, 21465L, 21430L, 21430L, 21465L, 21430L, 21360L, 21430L, 21360L, 21430L, 21430L, 21465L, 21360L, 21380L, 21334L, 41625L, 84836L, 84836L, 84836L, 84836L, 170996L, 170996L, 170996L), stdev_elevation2 = c(0L, 0L, 13L, 13L, 10L, 8L, 10L, 10L, 10L, 7L, 7L, 10L, 7L, 8L, 7L, 8L, 7L, 7L, 10L, 8L, 6L, 5L, 1644L, 2723L, 2723L, 2723L, 2723L, 8418L, 8418L, 8418L), d2coast2 = c(81L, 77L, 20236L, 20236L, 19753L, 18932L, 19479L, 19753L, 19753L, 18449L, 18449L, 19753L, 18449L, 18932L, 18449L, 18932L, 18449L, 18449L, 19753L, 18932L, 18064L, 17678L, 11L, 100L, 100L, 100L, 100L, 52L, 52L, 52L), presence = c("no", "yes", "no", "no", "no", "no", "no", "no", "no", "no", "no", "no", "no", "no", "no", "no", "no", "no", "no", "no", "no", "no", "no", "no", "no", "no", "no", "no", "no", "no"), Region_Code = c(3L, 3L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L)), row.names = c(NA, -30L), class = c("tbl_df", "tbl", "data.frame"))

Here is the model I'm having issues with:

library(caret) library(CAST) library(pROC)

test_data <- as.data.frame(test_data)

indices <- CreateSpacetimeFolds(test_data, spacevar = "Region_Code", k = 3)

pred <- test_data[,1:8] obs <- test_data[,9]

doesn't work

model1 <- ffs(predictors = pred, response = obs, trControl = trainControl(method = 'cv', number = 12, summaryFunction = twoClassSummary, classProbs = TRUE, savePredictions = TRUE), minVar = 2, method = 'glm', family = 'binomial', metric = 'ROC', index = indices$index)

[1] "model using elevation,stdev_elevation will be trained now..." Something is wrong; all the ROC metric values are missing: ROC Sens Spec
Min. : NA Min. : NA Min. : NA
1st Qu.: NA 1st Qu.: NA 1st Qu.: NA
Median : NA Median : NA Median : NA
Mean :NaN Mean :NaN Mean :NaN
3rd Qu.: NA 3rd Qu.: NA 3rd Qu.: NA
Max. : NA Max. : NA Max. : NA
NA's :1 NA's :1 NA's :1
Error: Stopping In addition: There were 13 warnings (use warnings() to see them)

works fine if we remove the index

model2 <- ffs(predictors = pred, response = obs, trControl = trainControl(method = 'cv', number = 12, summaryFunction = twoClassSummary, classProbs = TRUE, savePredictions = TRUE), minVar = 2, method = 'glm', family = 'binomial', metric = 'ROC')

[1] "model using elevation,stdev_elevation will be trained now..." [1] "maximum number of models that still need to be trained: 48" [1] "model using elevation,d2coast will be trained now..." [1] "maximum number of models that still need to be trained: 47" [1] "model using elevation,lc_class will be trained now..." [1] "maximum number of models that still need to be trained: 46" [1] "model using elevation,substrate will be trained now..." [1] "maximum number of models that still need to be trained: 45" [1] "model using elevation,elevation2 will be trained now..." [1] "maximum number of models that still need to be trained: 44" [1] "model using elevation,stdev_elevation2 will be trained now..." [1] "maximum number of models that still need to be trained: 43" [1] "model using elevation,d2coast2 will be trained now..."

I'm interested in two things:

  1. Why am I getting this error specifically for this model?

  2. What does this error code mean in general? Knowing that the ROC metrics are missing has not helped me or the people in the stackoverflow questions listed above in figuring out what is wrong with their models that is leading to this error. I haven't been able to identify a common theme in all the potential solutions that have been suggested.

Relevant session info:

sessionInfo() R version 4.2.2 (2022-10-31 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale: [1] LC_COLLATE=English_Canada.utf8 LC_CTYPE=English_Canada.utf8 LC_MONETARY=English_Canada.utf8 LC_NUMERIC=C LC_TIME=English_Canada.utf8

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] pROC_1.18.0 CAST_0.7.1 caret_6.0-93 lattice_0.20-45 ggplot2_3.4.1

loaded via a namespace (and not attached): [1] tidyselect_1.2.0 terra_1.7-3 purrr_1.0.1 reshape2_1.4.4 listenv_0.9.0 splines_4.2.2 colorspace_2.1-0
[8] vctrs_0.5.2 generics_0.1.3 stats4_4.2.2 utf8_1.2.3 survival_3.4-0 prodlim_2019.11.13 rlang_1.0.6
[15] ModelMetrics_1.2.2.2 pillar_1.8.1 glue_1.6.2 withr_2.5.0 foreach_1.5.2 lifecycle_1.0.3 plyr_1.8.8
[22] lava_1.7.2.1 stringr_1.5.0 timeDate_4022.108 munsell_0.5.0 gtable_0.3.1 future_1.31.0 recipes_1.0.5
[29] codetools_0.2-18 parallel_4.2.2 class_7.3-20 fansi_1.0.4 Rcpp_1.0.10 scales_1.2.1 ipred_0.9-13
[36] parallelly_1.34.0 digest_0.6.31 stringi_1.7.12 dplyr_1.1.0 grid_4.2.2 hardhat_1.2.0 cli_3.6.0
[43] tools_4.2.2 magrittr_2.0.3 tibble_3.1.8 future.apply_1.10.0 pkgconfig_2.0.3 MASS_7.3-58.1 Matrix_1.5-3
[50] data.table_1.14.8 lubridate_1.9.2 timechange_0.2.0 gower_1.0.1 rstudioapi_0.14 iterators_1.0.14 R6_2.5.1
[57] globals_0.16.2 rpart_4.1.19 nnet_7.3-18 nlme_3.1-160 compiler_4.2.2