Open matteofrigo opened 3 years ago
Thanks for the contribution! This already looks great.
I agree that it would be nice to have GroupL1 and GroupL1nc merged as a single object.
I agree with you that intuitively the order in which the forgotten groups appear should not matter for the penalty. This might change if we allow to have a different per-group regularization, but that's not the case yet.
This last part makes me wonder if we should weight the group regularization by the size of the group, as done for instance in Eq. (1) in https://statweb.stanford.edu/~tibs/ftp/sparse-grlasso.pdf . I think this makes a lot of sense when groups have very different sizes. Do you have an opinion on this @matteofrigo ? Of course this is somewhat orthogonal to your pull request, so feel free to ignore this point or postpone for a later contribution
As for the second concern, I would stop at non-overlapping groups, at least for the pull request. The code for more general objects would be significantly different, and which point I would split it in a different object.
I added the possibility to define group-specific penalties and some more general tests. Also, the two classes are merged back to a single one (GroupL1).
For the moment I added the possibility to use group-wise weights that are like the ones in the paper you mentioned (sqrt(len(g))
), it's inverse, and custom weights passed by the user. By default it uses a weight equal to 1 for each group. Anything else that you think it should go there @fabianp ?
It seems like my changes to how the B matrix is built broke everything. I'll look into it. For the dense matrices case it works, anyway.
The problem was in a test where the group sparsity was used with overlapping groups. I added a separate class for that case to keep the two worlds separate as @fabianp suggested in https://github.com/openopt/copt/pull/94#issuecomment-809608046 .
As for the second concern, I would stop at non-overlapping groups, at least for the pull request. The code for more general objects would be significantly different, and which point I would split it in a different object.
To summarise, here's what we did with this PR until now:
Anything to add?
Thanks Matteo. We're almost there, but there's a test that has been commented out while I think its important to keep it, as it's testing a valid implementation of the overlapping group lasso
Sorry @fabianp , I've been offline for some days.
I removed the test because I have misunderstood the suggestion to avoid the overlapping groups case.
Speaking about that test, I'm not able to make it pass. I haven't been able to track the bug to its root, but it seems to me that the new definition of the block matrix (in the penalty) changes the definition of the support matrix in a way that makes it wrong. Notice that the only actual difference with the old implementation of the penalty is the order in which we treat the groups. Do you have any low-level way to debug and test the prox_factory
function? I haven't found anything except from testing the correctness of the block matrix.
This PR adds a new class to the copt.penalty module which defines a non-contiguous version of the existing GroupL1 penalty. The main differences between the two are:
B
matrix adapted to non-contiguous setting (it's no longer block-diagonal)._prox_gl
adapted to non-contiguous setting.Minor style/PEP8 improvements have been applied to the modified files.
I've still got a couple of concerns on which I would like to have your opinion @fabianp .
B
matrix inprox_factory
that I implemented in GroupL1nc is slightly different with respect to the one of GroupL1. The difference is in how the "forgotten" features are encoded inB
. Here's a small example. Letn_features = 5
andgroups = [(0, 1), (3, 4)]
, then theB
matrix for the two methods is the following:I made a few tests and it seems to me that this does not change the result, but I don't have a proof for it. What happens is that we reorder the groups by putting all the "-1" singletons after the groups. Are you aware of any reason why it shouldn't work?
After solving these two points we can think of merging GroupL1 and GroupL1nc, as the latter trivially generalizes the former.
This PR addresses #20 .