idc9 commented 3 years ago

I am very happy to that see someone implementing adaptive Lasso in Python (#169)! It would be great if celer also implemented the more general LLA algorithm for any folded concave penalty e.g. see One-step sparse estimates in nonconcave penalized likelihood models, (Zou and Li, 2008) and Strong oracle optimality of folded concave penalized estimation, (Fan el at. 2014). The LLA algorithm is a mild, but statistically very nice generalization of AdaptiveLasso.

The main differences between the general LLA algorithm and AdaptiveLasso are

LLA typically uses a better initialize e.g. a Lasso solution or simply 0 instead of the least squares solution
LLA allows for different penalties (e.g. using SCAD the LLA algorithm satisfies the desirable strong oracle property)

The LLA algorithm should be fairly straightforward to implement, granted I'm not yet very familiar with the backend of celer.

LLA algorithm sketch

User input:

tuning parameter \lambda
concave penalty function g_{\lambda} (e.g. SCAD, MCP)
initial value, \beta^0
stopping criteria, either A) stop after s = 1 steps (so called "one step estimator") or B) stop at convergence

for s= 1, 2, ....

w^s = compute Lasso weights at current guess \beta^{s-1}

\beta^{s} = solve weighted lasso problem using weights w^s

check stopping criteria

josephsalmon commented 3 years ago

Thanks for the proposition and the references! Actually we are working on that.

We mostly need a better naming than AdatativeLasso, since we are actually targeting what you describe. Moreover, this majorization / minimization approach has many names in the literature:

"DC programming for non-convex sparsity inducing penalties": http://asi.insa-rouen.fr/enseignants/~gasso/public/Publications/concaveLasso.pdf)
"Reweighted L1Minimization": https://web.stanford.edu/~boyd/papers/pdf/rwl1.pdf

We'll try to choose a more suitable name then... I would lean toward "ReweigthedL1" since this is rather clear from the name what it does (and could also then be adapted to logistic regression versions). wdyt @agramfort @mathurinm @arakotom

idc9 commented 3 years ago

Great to hear!

I agree reweighted L1 is a good name i.e. tells you exactly what the algorithm does. That being said the stats literature tends to use LLA (e.g. see the above references). This is relevant because of the importance of the one-step version of this algorithm that many users may want.

mathurinm commented 3 years ago

Hello @idc9, thanks for the interest. For the reweighting scheme, I plan to have a few string options corresponding to known penalites (Lq, log), but the user will also be able to pass a callable so that at iteration iter + 1, weights[j] = weight(solution[iter][j]). This should allow a large flexibility, together with the setting of n_reweightings.

As far as the naming goes, I think IterativeReweightedLasso is more explicit and corresponds to the truth ; we can duplicate the classe to AdaptiveLasso which seems to be the most popular naming. It'll requier a bit of explanation in the docstring. I'll work on that in the upcoming month.

idc9 commented 3 years ago

Awesome! It might be worth adding SCAD and MCP to the defaults because of their nice statistical properties (and it's hard to find these implemented penalties in python!)

josephsalmon commented 3 years ago

I totally agree for MCP (though I found SCAD overrated: https://hal.inria.fr/hal-01267701v2/document, section 5.2).

Moreover, in terms of history, I found an even earlier work where the iterative reweighted scheme was used for non-convex sparse regularization, in the signal/image community (if you know older, that would be nice):

"Sparse Multinomial Logistic Regression:Fast Algorithms and Generalization Bounds" Balaji Krishnapuram, Lawrence Carin, Mario A.T. Figueiredo and Alexander J. Hartemink http://www.lx.it.pt/~mtf/Krishnapuram_Carin_Figueiredo_Hartemink_2005.pdf

It does not help for the naming though... And I'm fine with IterativeReweightedLasso.

arakotom commented 3 years ago

Hi all, sorry for being late on this. IterativeReweightedLasso is a good naming and clear enough regardless of the underlying theoretical framework (DC or LLA).

is there any point I can help on this or most of the things have already been done?

mathurinm / celer

Feature request: the LLA algorithm for general folded concave penalties #185

LLA algorithm sketch