Open jolars opened 6 years ago
Two things come to mind:
i) mgaussian
has a group lasso penalty so the two problems are tied together and it won't just be the max for each columns - what you're trying is just the lambda_max for fitting two gaussian
independently
ii) IIRC, the standardization of y
is a bit different for the Gaussian family than other cases.
I can do the KKT analysis for the group lasso + mgaussian Tuesday and figure out what lambda_max should be - it's a holiday weekend in the US and I'm out of the office now though.
Alright, that sounds great!
As expected, you need to account for the "coupling" from the mgaussian
(group lasso) to get things to match.
library(glmnet)
set.seed(125)
n <- 50
p <- 5
k <- 2
X <- matrix(rnorm(n * p), ncol=p)
y <- matrix(rnorm(n * k), ncol=k)
max(glmnet(X, y, family="mgaussian",
standardize = FALSE,
standardize.response = FALSE,
intercept=FALSE)$lambda)
max(sqrt(rowSums(crossprod(X, y)^2))/n)
Both give lambda_max = 0.2187856
. Standardization and intercepts seem to work as they do for the regular Gaussian.
Mathematically, the core idea here is in thinking through some relationships among dual norms and their associated unit balls, subdifferentials, and convex conjugates.
Super informally, with the regular lasso (L1 penalty), we get lambda_max by applying the dual norm (the L-Infinty norm = element-wise max) to crossprod(X, y)
which is the "residual" when beta = 0
. The dual norm of an Lp norm is the Lq norm for q satisfying 1/p + 1/q = 1. Here p = 1, so 1/p = 1 => 1/q = 0 => q = infinity.
For the group lasso (L1/L2 mixed norm), the dual is L-Infinity/L2, so we take the element-wise max of the 2 norms for each row of crossprod(X, y)
. Here, note that 1/2 + 1/2 = 1, so the Euclidean norm is its own dual.
I expanded on the math a bit at https://stats.stackexchange.com/a/348850/61353
Thank you, Michael. Great write up!
I am trying to figure out how glmnet picks lambda_max, the starting value for its regularization path, for the multivariate gaussian model but haven't been able to get it right. I understand that it should correspond to max_l 1/(n*alpha)|<x_l, y>|
This is just an example to illustrate and not the actual implementation as it stands:
I feel like this is something silly that I am missing. Do you know what I am missing @tdhock @michaelweylandt ?