topepo / caret

caret (Classification And Regression Training) R package that contains misc functions for training and plotting classification and regression models
http://topepo.github.io/caret/index.html
1.62k stars 632 forks source link

"Something is wrong; all the Accuracy metric values are missing:" #160

Closed randomjohn closed 9 years ago

randomjohn commented 9 years ago

I have done, from a clean R 3.2.0 (x64 Windows 8) installation:

library(caret) data(iris) iris2 <- iris[iris$Species %in% c("virginica","setosa"),] mod1 <- train(Species~Sepal.Length+Sepal.Width,data=iris2)

note: only 1 unique complexity parameters in default grid. Truncating the grid to 1 .

Something is wrong; all the Accuracy metric values are missing:
    Accuracy       Kappa    
 Min.   : NA   Min.   : NA  
 1st Qu.: NA   1st Qu.: NA  
 Median : NA   Median : NA  
 Mean   :NaN   Mean   :NaN  
 3rd Qu.: NA   3rd Qu.: NA  
 Max.   : NA   Max.   : NA  
 NA's   :1     NA's   :1    
Error in train.default(x, y, weights = w, ...) : Stopping
In addition: There were 26 warnings (use warnings() to see them)
> defaultSummary()
Error in defaultSummary() : argument "data" is missing, with no default
> fix(defaultSummary)
> mod1 <-     train(Species~Sepal.Length+Sepal.Width,data=iris2,trainControl=trainControl(summaryFunction=defaultSummary))
note: only 1 unique complexity parameters in default grid. Truncating the grid to 1 .

Something is wrong; all the Accuracy metric values are missing:
    Accuracy       Kappa    
 Min.   : NA   Min.   : NA  
 1st Qu.: NA   1st Qu.: NA  
 Median : NA   Median : NA  
 Mean   :NaN   Mean   :NaN  
 3rd Qu.: NA   3rd Qu.: NA  
 Max.   : NA   Max.   : NA  
 NA's   :1     NA's   :1    
Error in train.default(x, y, weights = w, ...) : Stopping
In addition: There were 26 warnings (use warnings() to see them)

warnings()

Gives me:

Warning messages:
1: In eval(expr, envir, enclos) :
  model fit failed for Resample01: mtry=2 Error in randomForest.default(x, y, mtry = param$mtry,     ...) : 
  Can't have empty classes in y.

2: In eval(expr, envir, enclos) :
  model fit failed for Resample02: mtry=2 Error in randomForest.default(x, y, mtry = param$mtry,     ...) : 
  Can't have empty classes in y.
(and so forth)

This is as about a barebones of an example I can give, but basically now caret is unusable as a classification tool because of the above.

> sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8 x64 (build 9200)

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] randomForest_4.6-10 caret_6.0-47        ggplot2_1.0.1      
[4] lattice_0.20-31    

loaded via a namespace (and not attached):
 [1] Rcpp_0.11.6         compiler_3.2.0      nloptr_1.0.4       
 [4] plyr_1.8.2          iterators_1.0.7     class_7.3-12       
 [7] tools_3.2.0         digest_0.6.8        lme4_1.1-7         
[10] nlme_3.1-120        gtable_0.1.2        mgcv_1.8-6         
[13] Matrix_1.2-0        foreach_1.4.2       parallel_3.2.0     
[16] brglm_0.5-9         SparseM_1.6         proto_0.3-10       
[19] e1071_1.6-4         BradleyTerry2_1.0-6 stringr_1.0.0      
[22] gtools_3.5.0        grid_3.2.0          nnet_7.3-9         
[25] minqa_1.2.4         reshape2_1.4.1      car_2.0-25         
[28] magrittr_1.5        scales_0.2.4        codetools_0.2-11   
[31] MASS_7.3-40         splines_3.2.0       pbkrtest_0.4-2     
[34] colorspace_1.2-6    quantreg_5.11       stringi_0.4-1      
[37] munsell_0.4.2      
> 
topepo commented 9 years ago

Eliminating one of the classes of the factor still keeps the same levels:

> levels(iris2$Species)
[1] "setosa"     "versicolor" "virginica" 

and ROC curves require a factor with two levels. Reset them via:

> iris2$Species <- factor(as.character(iris2$Species))
> levels(iris2$Species)
[1] "setosa"    "virginica"

and mod1 works fine. The warning Can't have empty classes in y. doesn't do a good job helping figure that out.

Max

alexWhitworth commented 9 years ago

I'm not sure that this is the appropriate issue to mention this, since it seems that the problem above is from improperly specifying factors; but I'm getting the same error message:

Full details here http://stackoverflow.com/questions/33088893/caret-random-forests-not-working-something-is-wrong-all-the-accuracy-metric

topepo commented 9 years ago

I've added a check in the new version to make sure that the outcome is a factor with non-zero frequencies. You can relevel the outcome prior to fitting the model.

For predictors, I'm in the process of added options of removing zero- and near-zero variance predictors to preProcess.

alexWhitworth commented 9 years ago

As I mentioned, I'm not sure this is the appropriate issue to comment in, since my issue isn't really the same as the above--it's just the same error message.

@topepo My outcome has no levels with non-zero frequencies and none of my predictors have zero / near-zero variances. It's possible that that is the case in the test data that I have on SO, I'll double check and update if so.

Updated: So, in my test data, variable "x7" does have a very small variance. But, if I remove this variable (test$x7 <- NULL), I still get the same errors.

pverspeelt commented 9 years ago

I tested alex's problem. The error is related to missing values in the resampled performance measures. Interestingly, if you change the method "cforest" or "parRF" to "rf" it works in parallel.

I also tested the "cforest" without running it in parallel and then it works. It looks like the parallel option causes some conflict with the building of the resampled performance measures when using the methods "cforest" or "parRF"

topepo commented 9 years ago

For Alex's problem, here is the answer that I posted on SO:

When I run the first cforest model, I can see that "In addition: There were 31 warnings (use warnings() to see them)". These say that

unused arguments (verbose = FALSE, proximity = FALSE, importance = TRUE)

These are arguments to the randomForest function and not cforest. Removing them removes the errors.

alexWhitworth commented 9 years ago

I have updated the SO post. I'm still getting errors

RobertFeyerharm commented 8 years ago

I received the same error message when running gbm in caret. I finally corrected the problem by removing the allowParallel=TRUE argument from the train() function.

ptagne commented 8 years ago

Hello, I am having the below error working locally on my laptop: Error in train.default(x, y, weights = w, ...) : The tuning parameter grid should have columns trials, model, winnow. I had caret version 6.0.72. It didn't work. I deleted it and installed version 6.0.35. Still having same errors. Any thought please?

ptagne commented 8 years ago

I got the error message when running the below command:

set.seed(300)
m <- train(as.factor(default) ~ ., data = credit_resample, method = "C5.0", metric = "kappa", trControl = ctrl, tuneGrid = grid)

Thanks

topepo commented 8 years ago

@ptagne we can't really do much without a small reproducible example and the results of sessionInfo.

ptagne commented 8 years ago

I downloaded caret version 6.0.35 and replaced caret version 6.0.73 and it worked. Thanks, Pascal.

On Wed, Nov 16, 2016 at 1:17 PM, Max Kuhn notifications@github.com wrote:

@ptagne https://github.com/ptagne we can't really do much without a small reproducible example and the results of sessionInfo.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/topepo/caret/issues/160#issuecomment-261043343, or mute the thread https://github.com/notifications/unsubscribe-auth/ATGZeB8wKwywJTXGTFyuWuvNT8qyGijjks5q-1bEgaJpZM4Ew9g2 .

"Clouds do not always mean rain but smoke is a sure sign of fire". A proverb from Djibouti.

AkhilChilakala commented 7 years ago

How to downgrade caret version

ViniciusBRodrigues commented 6 years ago

@pverspeelt tip worked fine for me.

dnentchev commented 4 years ago

I had similar problem. In my case I was using rfeControl(functions=caretFunctions for a binary logistic regression. It got resolved when I changed that to rfeControl(functions=lrFuncs

vanaris commented 4 years ago

I am working through your applied predictive modeling text, I am working on problems 12.3.c and I am having an issue where I am getting this error for ROC or accuracy on the churn data. however I am using library(modeldata) data("mlc_churn")

as CS50 is no more

I did the following, and I am suspecting that there might be some linear combinations going on, but I am really new to that concept, and the trim function did not discard any values for the numerical predictors

I've tried changing the integer values by cutting them in to various levels and I am doing my best to not have any zero-frequency classes, this data is really imbalance and that posed a challenge. I was having the issue just leaving the integers as integers as well.

image

This is the splits I had for converting them to factors

Using LRA with glm or multinomial gets around 83% accuracy , and ROC of 0.85

I'm using, ctrl <- trainControl(method = "LGOCV", summaryFunction = twoClassSummary, classProbs = TRUE, savePredictions = TRUE)

and split the data like so,

Churn<-mlc_churn$churn training= createDataPartition(Churn, p = .8, list= FALSE) trainPreds<-preds[training,] testPreds<-preds[-training,] trainChurn <- Churn[training] testChurn <- Churn[-training]

All other models in chapter 12 are throwing this error, and I can't figure out what I am doing wrong. No issues on problems 12.1 or 12.2, so I must be missing something with pre-processing

I hope that is enough detail, to be pointed in the right direction.

example of error set.seed(1) plsChurn <- train(x = trainPreds, y = trainChurn, method = "pls", tuneGrid = expand.grid(.ncomp = 1:15), preProc = c("center","scale"),
metric = "ROC", probMethod = "Bayes", trControl = ctrl) Something is wrong; all the ROC metric values are missing: Error: Stopping In addition: There were 50 or more warnings (use warnings() to see the first 50) 50!

Similar for Linear Discriminant Analysis When I try using lda(x=trainPredsPreProcess, grouping=trainChurn), it builds the model, however when to predict on it, I get the following error: Error in FUN(x, aperm(array(STATS, dims[perm]), order(perm)), ...) : non-numeric argument to binary operator