Open tomwenseleers opened 4 years ago
I am not aware of any Implementation of a general l1-lq regularization either. It would be a good feature to add in mutar. Even the version of Dirty models (https://papers.nips.cc/paper/4125-a-dirty-model-for-multi-task-learning) which decouples the variables into x = x_1 + x_2 and applies a l1-lq on x_1 and l_1 on x_2 was proposed with q = infty but was implemented in mutar with q = 2 for the simplicity of the algorithm. To update x_1 they use an l1linfty iteration, and their pseudo-code seems to be more comprehensive than the other papers (see page 24 of appendix). It should be straighforward to implement.
This creates fits with greater sparsity than l1,l2 and fewer false positives,
Is this based on empirical or theoretical evidence ? Because It seems that the paper of Quattoni shows a better AUC with q = infty vs q=2 but the other one ( l1-lq penalization https://arxiv.org/pdf/1009.4766) says the opposite.
I was wondering if in the future it might also be possible to support l1,linfinity penalization, which is known as the "simultaneous LASSO", see Turlach 2005, https://www.tandfonline.com/doi/pdf/10.1198/004017005000000139 Liu et al 2009, https://www.cs.cmu.edu/afs/cs.cmu.edu/Web/People/fmri/papers/168-Blockwise-Coord-Descent.pdf (maybe the best en fastest algo, confusingly called multi-task LASSO here, but using l1,linfinity penalty unlike the l1,l2 penalty in sklearn's MultiTaskLasso) Quattoni et al 2009, https://dspace.mit.edu/bitstream/handle/1721.1/59367/Collins_An%20efficient.pdf?sequence=1&isAllowed=y Vogt & Roth 2010, https://www.researchgate.net/profile/Volker_Roth/publication/262409253_The_group-lasso_l_1_regularization_versus_l_12_regularization/links/09e41512b178be6c04000000/The-group-lasso-l-1-regularization-versus-l-1-2-regularization.pdf
Or its generalization: l1-lq penalization, see https://arxiv.org/pdf/1009.4766
This creates fits with greater sparsity than l1,l2 and fewer false positives, so could be useful for many multitask learning applications with a shared sparsity structure... But I haven't found any open implementations anywhere...