topepo / caret

caret (Classification And Regression Training) R package that contains misc functions for training and plotting classification and regression models
http://topepo.github.io/caret/index.html
1.62k stars 633 forks source link

if (tmps < .Machine$double.eps^0.5) 0 else tmpm/tmps error missing value... when using adaptive_cv #402

Closed farbodr closed 8 years ago

farbodr commented 8 years ago

I've been getting this error intermittently when using adaptive cv and gam model.

Error in if (tmps < .Machine$double.eps^0.5) 0 else tmpm/tmps : missing value where TRUE/FALSE needed

It doesn't happen all the time so I can't provide a good data set to reproduce the error.

FR

topepo commented 8 years ago

Can you at least share the code (and maybe str on the data) that generated the error along with the results of sessionInfo()?

farbodr commented 8 years ago

Sorry. Here is some more info. I just noticed that if I don't run in parallel every model fit results in a warning.

llSummaryFunction = function(data, lev, model){
  ll = mnLogLoss(data, lev, model) 
  ret = ll; names(ret) = 'logLoss'
  return(ret)
}

gamGrid <- expand.grid(select=c("GCV.Cp", "ML"), method=c(TRUE, FALSE))

fitControl <- trainControl(method = "adaptive_cv",
                           number = 5,
                           verboseIter = TRUE,
                           savePredictions="final",
                           classProbs = TRUE,
                           allowParallel = FALSE,
                           summaryFunction = llSummaryFunction,
                           adaptive = list(min = 4,
                                           alpha = 0.05,
                                           method = "gls",
                                           complete = TRUE))

gam1.model <- caret::train(trainData, 
                           trainClasses,
                           method="gam", 
                           tuneGrid=gamGrid,
                           trControl=fitControl,
                           metric="logLoss")

results in this warning and then the error

+ Fold4.Rep1: select=GCV.Cp, method=FALSE 
model fit failed for Fold4.Rep1: select=GCV.Cp, 
method=FALSE Error in estimate.gam(G, method, optimizer, control, in.out, scale, gamma,  : 
unknown smoothness selection criterion

here is the sessionInfo

R version 3.2.3 (2015-12-10)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.4 (Yosemite)

locale:
  [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
  [1] splines   parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
  [1] mgcv_1.8-12         nlme_3.1-125        plyr_1.8.3          gbm_2.1.1          
[5] survival_2.38-3     glmnet_2.0-4        Matrix_1.2-4        caretEnsemble_2.0.0
[9] caret_6.0-64        ggplot2_2.1.0       lattice_0.20-33     pls_2.5-0          
[13] klaR_0.6-12         MASS_7.3-45         data.table_1.9.6    doMC_1.3.4         
[17] iterators_1.0.8     foreach_1.4.3      

loaded via a namespace (and not attached):
  [1] Rcpp_0.12.3        compiler_3.2.3     nloptr_1.0.4       class_7.3-14       tools_3.2.3       
[6] lme4_1.1-11        digest_0.6.9       gtable_0.2.0       SparseM_1.7        gridExtra_2.2.1   
[11] e1071_1.6-7        stringr_1.0.0      MatrixModels_0.4-1 stats4_3.2.3       combinat_0.0-8    
[16] grid_3.2.3         nnet_7.3-12        pbapply_1.2-0      minqa_1.2.4        reshape2_1.4.1    
[21] car_2.1-1          magrittr_1.5       scales_0.4.0       codetools_0.2-14   pbkrtest_0.4-6    
[26] colorspace_1.2-6   quantreg_5.21      stringi_1.0-1      munsell_0.4.3      chron_2.3-47      
> 
topepo commented 8 years ago

You have the parameter names reversed; try this and see if it solves the issue:

gamGrid <- expand.grid(method =c("GCV.Cp", "ML"), select=c(TRUE, FALSE))
farbodr commented 8 years ago

I was using tuneLength before switching to grid and got same result. Here is what I get when I used tuneLength

fitControl <- trainControl(method = "adaptive_cv",
                           number = 5,
                           verboseIter = TRUE,
                           savePredictions="final",
                           classProbs = TRUE,
                           allowParallel = FALSE,
                           summaryFunction = llSummaryFunction,
                           adaptive = list(min = 4,
                                           alpha = 0.05,
                                           method = "gls",
                                           complete = TRUE))

gam1.model <- caret::train(trainData, 
                           trainClasses,
                           method="gam", 
                           tuneLength=2,
                           trControl=fitControl,
                           metric="logLoss")

I get this warning and followed with the error (I've omitted some of the messages I the middle)

+ Fold1.Rep1: select= TRUE, method=GCV.Cp 
- Fold1.Rep1: select= TRUE, method=GCV.Cp 
+ Fold1.Rep1: select=FALSE, method=GCV.Cp 
- Fold1.Rep1: select=FALSE, method=GCV.Cp 
+ Fold2.Rep1: select= TRUE, method=GCV.Cp 
model fit failed for Fold2.Rep1: select= TRUE, method=GCV.Cp Error in gam.fit3(x = X, y = y, sp = L %*% lsp1 + lsp0, Eb = Eb, UrS = UrS,  : 
 inner loop 3; can't correct step size

Error in if (tmps < .Machine$double.eps^0.5) 0 else tmpm/tmps : 
missing value where TRUE/FALSE needed
farbodr commented 8 years ago

This code throws the error everytime with this dataset.

rm(list = ls())
library(caret)
library(mgcv)

load('train.RData')
llSummaryFunction = function(data, lev, model){
  ll = mnLogLoss(data, lev, model) 
  ret = ll; names(ret) = 'logLoss'
  return(ret)
}

fitControl <- trainControl(method = "adaptive_cv",
                           number = 5,
                           verboseIter = TRUE,
                           savePredictions="final",
                           classProbs = TRUE,
                           summaryFunction = llSummaryFunction,
                           adaptive = list(min = 4,
                                           alpha = 0.05,
                                           method = "gls",
                                           complete = TRUE))

gam1.model <- caret::train(trainData, 
                           trainClasses,
                           method="gam", 
                           tuneLength=2, #Grid=gamGrid,
                           trControl=fitControl,
                           metric="logLoss")

train.RData.zip

topepo commented 8 years ago

There seems to be two issues:

  1. there are predictors that, when resampled, only have a single value and that causes the model to fail. You can avoid this using the options preProc = "zv".
  2. the GAM model has convergence issues (for whatever reason). I was able to get many of the models to complete but lowering the convergence criteria using the option control = gam.control(epsilon = 1e-04, mgcv.tol=1e-04). You might need to lower them further to get more to finish.
> gam1.model
Generalized Additive Model using Splines 

3430 samples
 158 predictor
   2 classes: 'NO', 'YES' 

Pre-processing: remove (10) 
Resampling: Adaptively Cross-Validated (5 fold, repeated 1 times) 
Summary of sample sizes: 2745, 2744, 2744, 2744, 2743 
Resampling results across tuning parameters:

  select  logLoss    Resamples
  FALSE   0.5032336  5        
   TRUE   0.5087389  5        

Tuning parameter 'method' was held constant at a value of GCV.Cp
logLoss was used to select the optimal model using  the smallest value.
The final values used for the model were select = FALSE and method = GCV.Cp.