square-root LASSO implementation

asberk commented 2 years ago

As suggested by @agramfort , I am opening an issue here (original) to make a feature request for the square-root LASSO.

Describe the workflow you want to enable

The unconstrained LASSO is currently implemented in lasso_path. However, the parameter for unconstrained LASSO is sensitive to the measurement noise. Instead, the square-root LASSO has objective norm(Ax-b) + alpha * norm(x, 1) and its parameter, alpha, is insensitive to the noise level. I would like an sklearn interface (with fit, predict, etc.) coupled with a lasso_path-like function, srlasso_path, which solves the square-root LASSO optimization program.

Describe your proposed solution

I propose an sklearn-friendly wrapper around the cvxpy or cvxopt interface. More ambitiously, I propose an implementation based on [1] or [2] that implements a fast method for solving square-root LASSO (either via (accelerated) PGD or ADMM).

[1] "On fast convergence of proximal algorithms for SQRT-LASSO optimization: Don’t worry about its nonsmooth loss function" by Li, Xinguo, Jiang, Haoming, et al. (2020) PMLR. [2] "The flare package for high dimensional linear regression and precision matrix estimation in R" by Li, Xingguo, Zhao, Tuo, Yuan, Xiaoming and Liu, Han. (2015) JMLR.

Describe alternatives you've considered, if relevant

Currently, statsmodels offers a square-root LASSO solver implemented via the Python package cvxopt - this approach is slow. Using cvxpy with another interior point solver like MOSEK offers only 2x speed-up (it is snails compared to lasso_path).

Separately, if the problem is appropriately regularized to avoid over-fitting, some work has shown that it's possible to use accelerated proximal gradient descent to achieve convergence. However, the paper offers an algorithm with several critical implementation details unprovided (referring to [1] above).

Additional context

Note that the characterization of srLASSO as a second-order conic programming (SOCP) problem is given in [3]

[3] Belloni, A., Chernozhukov, V. and Wang, L. Square-Root LASSO: Pivotal recovery of sparse signals via conic programming https://arxiv.org/pdf/1009.5689.pdf

mathurinm commented 2 years ago

Hi Aaron, thanks for this proposal. At this stage I would prefer not to rely on cvxpy for our package, as it is an additional dependency and is not rely suited to large scale problems we want to tackle.

The approach of [1] would be interesting in my opinion. As a first step, do you feel like implementing the algorithm? We can help with feedback and discussion.

An other interesting approach is [4], that allows using standard solvers such as coordinate descent, and provides the same solution as the sqrt Lasso provided the smoothing parameter is low enough.

[4] Eﬀicient Smoothed Concomitant Lasso Estimation for High Dimensional Regression, E. Ndiaye, O. Fercoq, A. Gramfort, V. Leclère, J. Salmon

asberk commented 2 years ago

Hi Mathurin, I suggested a CVXPY wrapper mostly because it is easily achievable; however, the ideal would be to have a (much) faster solver. I can take a stab at implementing [1] in Python. I think it would require us putting our heads together to figure out some of the missing details in order to get a reliable solver whose output agrees with CVXPY in the regime where the approximation holds. Before I look into [1] further, I'll look at [4] first. If [4] could be implemented using sklearn's _cd_fast.pyx that would be a boon. Nice that a co-author of skglm is a co-author on the paper, too. It looks like [4] also requires that the regularization parameter be large enough (Remark 1), or am I missing something?

scikit-learn-contrib / skglm