jolars / sgdnet

Fast Sparse Linear Models for Big Data with SAGA
https://jolars.github.io/sgdnet/
GNU General Public License v3.0
5 stars 2 forks source link

lambda_{max} for multivariate gaussian model #3

Open jolars opened 6 years ago

jolars commented 6 years ago

I am trying to figure out how glmnet picks lambda_max, the starting value for its regularization path, for the multivariate gaussian model but haven't been able to get it right. I understand that it should correspond to max_l 1/(n*alpha)|<x_l, y>|

This is just an example to illustrate and not the actual implementation as it stands:

# the univariate case
x <- as.matrix(iris[, 2:4])
y <- iris[, 1]

get_lambda <- function(x, y, alpha = 1) {
  y <- as.matrix(y)

  sd2 <- function(y) sqrt(sum((y - mean(y))^2)/length(y))

  sy <- apply(y, 2, sd2)
  sx <- apply(x, 2, sd2)

  xx <- as.matrix(scale(x, scale = sx))
  yy <- as.matrix(scale(y, scale = sy))

  lambda <- max(abs(crossprod(yy, xx)) * sy)/NROW(x)
  lambda/max(0.001, alpha)
}

# the univariate case
x <- as.matrix(iris[, 2:4])
y <- iris[, 1]

# looks fine!
max(glmnet(x, y)$lambda)
#> [1] 0.7194595
get_lambda(x, y)
#> [1] 0.7194595

# the multivariate case
x <- as.matrix(iris[, 3:4])
y <- iris[, 1:2]

# something is wrong
max(glmnet(x, y, "mgaussian")$lambda)
#> [1] 0.7431435
get_lambda(x, y)
#> [1] 0.7194595

coef(glmnet(x, y, "mgaussian", lambda = get_lambda(x, y)))
#> $Sepal.Length
#> 3 x 1 sparse Matrix of class "dgCMatrix"
#>                      s0
#>              5.79435769
#> Petal.Length 0.01303237
#> Petal.Width  .         
#> 
#> $Sepal.Width
#> 3 x 1 sparse Matrix of class "dgCMatrix"
#>                        s0
#>               3.070002986
#> Petal.Length -0.003371382
#> Petal.Width   .  

I feel like this is something silly that I am missing. Do you know what I am missing @tdhock @michaelweylandt ?

michaelweylandt commented 6 years ago

Two things come to mind: i) mgaussian has a group lasso penalty so the two problems are tied together and it won't just be the max for each columns - what you're trying is just the lambda_max for fitting two gaussian independently ii) IIRC, the standardization of y is a bit different for the Gaussian family than other cases.

I can do the KKT analysis for the group lasso + mgaussian Tuesday and figure out what lambda_max should be - it's a holiday weekend in the US and I'm out of the office now though.

jolars commented 6 years ago

Alright, that sounds great!

michaelweylandt commented 6 years ago

As expected, you need to account for the "coupling" from the mgaussian (group lasso) to get things to match.

library(glmnet)

set.seed(125)

n <- 50
p <- 5
k <- 2

X <- matrix(rnorm(n * p), ncol=p)
y <- matrix(rnorm(n * k), ncol=k)

max(glmnet(X, y, family="mgaussian", 
           standardize = FALSE, 
           standardize.response = FALSE, 
           intercept=FALSE)$lambda)

max(sqrt(rowSums(crossprod(X, y)^2))/n)

Both give lambda_max = 0.2187856. Standardization and intercepts seem to work as they do for the regular Gaussian.

Mathematically, the core idea here is in thinking through some relationships among dual norms and their associated unit balls, subdifferentials, and convex conjugates.

Super informally, with the regular lasso (L1 penalty), we get lambda_max by applying the dual norm (the L-Infinty norm = element-wise max) to crossprod(X, y) which is the "residual" when beta = 0. The dual norm of an Lp norm is the Lq norm for q satisfying 1/p + 1/q = 1. Here p = 1, so 1/p = 1 => 1/q = 0 => q = infinity.

For the group lasso (L1/L2 mixed norm), the dual is L-Infinity/L2, so we take the element-wise max of the 2 norms for each row of crossprod(X, y). Here, note that 1/2 + 1/2 = 1, so the Euclidean norm is its own dual.

michaelweylandt commented 6 years ago

I expanded on the math a bit at https://stats.stackexchange.com/a/348850/61353

jolars commented 6 years ago

Thank you, Michael. Great write up!