yngvem / group-lasso

Group Lasso implementation following the scikit-learn API
MIT License
105 stars 32 forks source link

Specify Multiple Groups for One Feature #10

Open jlevy44 opened 4 years ago

jlevy44 commented 4 years ago

Nice package! Just following up from another thread, in your package is it possible to specify multiple groups for one feature (eg. overlapping groups)? Thanks!

yngvem commented 4 years ago

There is currently no way of specifying that. However, it is on my wishlist for the future.

The reason this is not supported yet is that it is not immediately clear how it should be done, and all options are mathematically much more complex than the non-overlapping case.

  1. We could have overlapping groups so that if a group is excluded, then the coefficients of all the covariates in the group is set to zero. This leads to a prox operator that is very difficult to compute.
  2. We could introduce auxiliary variables, essentially copying the columns that are repeated for each group they are contained in. This leads to a Lipschitz bound that is difficult to compute efficiently. (The approach of latent group lasso).

I am looking into it, but it is a low-priority issue unless I get many more requests for it. If your dataset is small, then you can manually implement the second option by creating multiple copies of rows that correspond to covariates in more than one group.

jlevy44 commented 4 years ago

I have a pretty big dataset, one of my ideas was to parallelize/scatter the l2 norms of the groups, but indexing/copying parts of the parameter matrix can be costly as you had mentioned

yngvem commented 4 years ago

The main reason for why it will be difficult to implement the latent group lasso is that there is no longer an easy closed-form solution to the Lipschitz bound of the loss gradient.

However, if I at some point get the time to implement Poisson regression, then I first need to implement a line-search based FISTA method. Once this is done, then latent group lasso seems relatively straightforward.

Unfortunately, I do not have much time to develop this project before the summer and a line-search will require much rewriting of the code so I will not add latent group lasso before July the earliest.

Edit: I am now using a line search for the step size, so this could in theory be implemented. Unfortunately I don't have time for that now, but welcome a pull request for it.