marjoleinF / pre

an R package for deriving Prediction Rule Ensembles
58 stars 17 forks source link

elnet error #13

Closed topepo closed 6 years ago

topepo commented 6 years ago

The data set has a fair amount of correlated predictors. No idea what would cause the error.

> library(QSARdata)
> library(pre)
> library(sessioninfo)
> 
> data(MeltingPoint)
> 
> mp_data <- MP_Descriptors
> mp_data$melt_point <- MP_Outcome
> 
> set.seed(36624)
> mod <- pre(melt_point ~ ., data = mp_data, verbose = TRUE)
A rule ensemble for prediction of a continuous response will be created.

A total of 500 trees and  3790 rules were generated initially.

A total of 39 generated rules were perfectly collinear with earlier rules and removed from the initial ensemble. 
($duplicates.removed and $complements.removed show which, if any).

An initial ensemble consisting of 3751 rules was succesfully created.

Error in elnet(x, is.sparse, ix, jx, y, weights, offset, type.gaussian,  : 
  NA/NaN/Inf in foreign function call (arg 5)
> 
> session_info()
─ Session info ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value                       
 version  R version 3.5.0 (2018-04-23)
 os       macOS High Sierra 10.13.4   
 system   x86_64, darwin15.6.0        
 ui       RStudio                     
 language (EN)                        
 collate  en_US.UTF-8                 
 tz       America/New_York            
 date     2018-05-09                  

─ Packages ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 package       * version date       source        
 clisymbols      1.2.0   2017-05-21 CRAN (R 3.5.0)
 codetools       0.2-15  2016-10-05 CRAN (R 3.5.0)
 earth           4.6.2   2018-03-21 CRAN (R 3.5.0)
 foreach         1.4.4   2017-12-12 CRAN (R 3.5.0)
 Formula         1.2-2   2017-07-10 CRAN (R 3.5.0)
 glmnet          2.0-16  2018-04-02 CRAN (R 3.5.0)
 inum            1.0-0   2017-12-12 CRAN (R 3.5.0)
 iterators       1.0.9   2017-12-12 CRAN (R 3.5.0)
 lattice         0.20-35 2017-03-25 CRAN (R 3.5.0)
 libcoin         1.0-1   2017-12-13 CRAN (R 3.5.0)
 magrittr        1.5     2014-11-22 CRAN (R 3.5.0)
 Matrix          1.2-14  2018-04-09 CRAN (R 3.5.0)
 mvtnorm         1.0-7   2018-01-26 CRAN (R 3.5.0)
 partykit        1.2-1   2018-04-20 CRAN (R 3.5.0)
 plotmo          3.3.6   2018-03-21 CRAN (R 3.5.0)
 plotrix         3.7     2017-12-07 CRAN (R 3.5.0)
 pre           * 0.5.0   2018-05-07 CRAN (R 3.5.0)
 QSARdata      * 1.3     2013-07-16 CRAN (R 3.5.0)
 rpart           4.1-13  2018-02-23 CRAN (R 3.5.0)
 sessioninfo   * 1.0.0   2017-06-21 CRAN (R 3.5.0)
 stringi         1.2.2   2018-05-02 CRAN (R 3.5.0)
 stringr         1.3.0   2018-02-19 CRAN (R 3.5.0)
 survival        2.42-3  2018-04-16 CRAN (R 3.5.0)
 TeachingDemos   2.10    2016-02-12 CRAN (R 3.5.0)
 withr           2.1.2   2018-03-15 CRAN (R 3.5.0)
 yaml            2.1.19  2018-05-01 CRAN (R 3.5.0)
> traceback()
4: elnet(x, is.sparse, ix, jx, y, weights, offset, type.gaussian, 
       alpha, nobs, nvars, jd, vp, cl, ne, nx, nlam, flmin, ulam, 
       thresh, isd, intr, vnames, maxit)
3: glmnet(x, y, weights = weights, offset = offset, lambda = lambda, 
       ...)
2: cv.glmnet(x, y, nfolds = nfolds, weights = weights, family = family, 
       parallel = par.final, standardize = standardize, ...)
1: pre(melt_point ~ ., data = mp_data, verbose = TRUE)

Side note: "succesfully" is misspelled. After years of ridiculously poor spelling, I have finally found someone else's typo

marjoleinF commented 6 years ago

It's due to winsorizing and normalizing. Variable mp_data$a_nI is almost all zeros, after winsorizing it's all zeros. Then it's divided by its own standard deviation and consequently NaN. Setting winsfrac=0 or normalize=FALSE yields no error. This should at least issue an informative warning though, thanks for reporting the error (and spelling mistake)!

On Thu, May 10, 2018 at 4:47 AM, Max Kuhn notifications@github.com wrote:

The data set has a fair amount of correlated predictors. No idea what would cause the error.

library(QSARdata)> library(pre)> library(sessioninfo)> > data(MeltingPoint)> > mp_data <- MP_Descriptors> mp_data$melt_point <- MP_Outcome> > set.seed(36624)> mod <- pre(melt_point ~ ., data = mp_data, verbose = TRUE)A rule ensemble for prediction of a continuous response will be created. A total of 500 trees and 3790 rules were generated initially. A total of 39 generated rules were perfectly collinear with earlier rules and removed from the initial ensemble. ($duplicates.removed and $complements.removed show which, if any). An initial ensemble consisting of 3751 rules was succesfully created. Error in elnet(x, is.sparse, ix, jx, y, weights, offset, type.gaussian, : NA/NaN/Inf in foreign function call (arg 5)> > session_info() ─ Session info ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── setting value version R version 3.5.0 (2018-04-23) os macOS High Sierra 10.13.4 system x86_64, darwin15.6.0 ui RStudio language (EN) collate en_US.UTF-8 tz America/New_York date 2018-05-09

─ Packages ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── package version date source clisymbols 1.2.0 2017-05-21 CRAN (R 3.5.0) codetools 0.2-15 2016-10-05 CRAN (R 3.5.0) earth 4.6.2 2018-03-21 CRAN (R 3.5.0) foreach 1.4.4 2017-12-12 CRAN (R 3.5.0) Formula 1.2-2 2017-07-10 CRAN (R 3.5.0) glmnet 2.0-16 2018-04-02 CRAN (R 3.5.0) inum 1.0-0 2017-12-12 CRAN (R 3.5.0) iterators 1.0.9 2017-12-12 CRAN (R 3.5.0) lattice 0.20-35 2017-03-25 CRAN (R 3.5.0) libcoin 1.0-1 2017-12-13 CRAN (R 3.5.0) magrittr 1.5 2014-11-22 CRAN (R 3.5.0) Matrix 1.2-14 2018-04-09 CRAN (R 3.5.0) mvtnorm 1.0-7 2018-01-26 CRAN (R 3.5.0) partykit 1.2-1 2018-04-20 CRAN (R 3.5.0) plotmo 3.3.6 2018-03-21 CRAN (R 3.5.0) plotrix 3.7 2017-12-07 CRAN (R 3.5.0) pre 0.5.0 2018-05-07 CRAN (R 3.5.0) QSARdata 1.3 2013-07-16 CRAN (R 3.5.0) rpart 4.1-13 2018-02-23 CRAN (R 3.5.0) sessioninfo 1.0.0 2017-06-21 CRAN (R 3.5.0) stringi 1.2.2 2018-05-02 CRAN (R 3.5.0) stringr 1.3.0 2018-02-19 CRAN (R 3.5.0) survival 2.42-3 2018-04-16 CRAN (R 3.5.0) TeachingDemos 2.10 2016-02-12 CRAN (R 3.5.0) withr 2.1.2 2018-03-15 CRAN (R 3.5.0) yaml 2.1.19 2018-05-01 CRAN (R 3.5.0)> traceback()4: elnet(x, is.sparse, ix, jx, y, weights, offset, type.gaussian, alpha, nobs, nvars, jd, vp, cl, ne, nx, nlam, flmin, ulam, thresh, isd, intr, vnames, maxit)3: glmnet(x, y, weights = weights, offset = offset, lambda = lambda, ...)2: cv.glmnet(x, y, nfolds = nfolds, weights = weights, family = family, parallel = par.final, standardize = standardize, ...)1: pre(melt_point ~ ., data = mp_data, verbose = TRUE)

Side note: "succesfully" is misspelled. After years of ridiculously poor spelling, I have finally found someone else's typo

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/marjoleinF/pre/issues/13, or mute the thread https://github.com/notifications/unsubscribe-auth/APwDnCBV18l1ANiVxARzldfP3c-NApadks5tw6ozgaJpZM4T5R6- .

marjoleinF commented 6 years ago

Variables with (close to) zero variance are now no longer normalized and a warning is issued.