hichamjanati / mutar

Multi-task regression in Python
BSD 3-Clause "New" or "Revised" License
26 stars 3 forks source link

Feature request: simultaneous lasso with l1,infinity or l1,lq norm penalization #7

Open tomwenseleers opened 4 years ago

tomwenseleers commented 4 years ago

I was wondering if in the future it might also be possible to support l1,linfinity penalization, which is known as the "simultaneous LASSO", see Turlach 2005, https://www.tandfonline.com/doi/pdf/10.1198/004017005000000139 Liu et al 2009, https://www.cs.cmu.edu/afs/cs.cmu.edu/Web/People/fmri/papers/168-Blockwise-Coord-Descent.pdf (maybe the best en fastest algo, confusingly called multi-task LASSO here, but using l1,linfinity penalty unlike the l1,l2 penalty in sklearn's MultiTaskLasso) Quattoni et al 2009, https://dspace.mit.edu/bitstream/handle/1721.1/59367/Collins_An%20efficient.pdf?sequence=1&isAllowed=y Vogt & Roth 2010, https://www.researchgate.net/profile/Volker_Roth/publication/262409253_The_group-lasso_l_1_regularization_versus_l_12_regularization/links/09e41512b178be6c04000000/The-group-lasso-l-1-regularization-versus-l-1-2-regularization.pdf

Or its generalization: l1-lq penalization, see https://arxiv.org/pdf/1009.4766

This creates fits with greater sparsity than l1,l2 and fewer false positives, so could be useful for many multitask learning applications with a shared sparsity structure... But I haven't found any open implementations anywhere...

hichamjanati commented 4 years ago

I am not aware of any Implementation of a general l1-lq regularization either. It would be a good feature to add in mutar. Even the version of Dirty models (https://papers.nips.cc/paper/4125-a-dirty-model-for-multi-task-learning) which decouples the variables into x = x_1 + x_2 and applies a l1-lq on x_1 and l_1 on x_2 was proposed with q = infty but was implemented in mutar with q = 2 for the simplicity of the algorithm. To update x_1 they use an l1linfty iteration, and their pseudo-code seems to be more comprehensive than the other papers (see page 24 of appendix). It should be straighforward to implement.

This creates fits with greater sparsity than l1,l2 and fewer false positives,

Is this based on empirical or theoretical evidence ? Because It seems that the paper of Quattoni shows a better AUC with q = infty vs q=2 but the other one ( l1-lq penalization https://arxiv.org/pdf/1009.4766) says the opposite.