Range of penalty() in Ridge Regression does not cover lambda chosen by cv.glmnet

andrjohns commented 4 years ago


I'm trying to use the tidymodels ecosystem to run a ridge regression, and so am using the grid_regular function to create the search space of lambda values to evaluate. However, these lambda values are always upper-bounded by 1, and when I run some data through cv.glmnet, the lambdas tested and chosen are greater than one:

dat = cbind(structure(rnorm(200*16),dim=c(200,16)),
            structure(sample(0:1,200*14,replace = T),dim=c(200,14))) %>%

mod = linear_reg(mode="regression",penalty=tune(),mixture=0) %>%

#> [1] 1e-10 1e+00

fit1 = cv.glmnet(dat[,2:30],dat[,1],alpha=0)
#> [1]   0.01971351 197.13506375
#> [1] 13.27537

Should I be using this functionality differently for ridge regression?

topepo commented 4 years ago

The range of lambda is data-driven and is affected by the choice of alpha. For that reason (and a few others), we use a default range of:

> penalty()
Amount of Regularization (quantitative)
Transformer:  log-10 
Range (transformed scale): [-10, 0]

which works well in 99% of the cases.

You can change the range if you need to go higher:

mod  <- 
  linear_reg(mode = "regression", penalty = tune(), mixture = 0) %>%

mod %>% 
  parameters() %>% 
  update(penalty = penalty(log10(c(0.02, 198)))) %>% 
  grid_regular(levels = 10) %>% 
#>     penalty        
#>  Min.   :  0.0200  
#>  1st Qu.:  0.2232  
#>  Median :  2.2556  
#>  Mean   : 30.9259  
#>  3rd Qu.: 21.5277  
#>  Max.   :198.0000

andrjohns commented 4 years ago

Excellent, thanks for the help Max!

