jolars / slopecd

4 stars 2 forks source link

WIP feat: add adaptive PGD iterates for hybrid solver #22

Closed jolars closed 1 year ago

jolars commented 2 years ago

I'm not really sure this is a good idea in its current state, but something similar to this might prove useful. The idea is to adaptively use PGD updates based on the duality gap, but as you can see here it works so-so:

image

Depending on the settings it can also be worse, and just updating every kth epoch seems more stable. But maybe we can figure out a better way? Perhaps just checking if clusters change between PGD epochs or not and then just increase halve the frequence of pgd updates or something. WDYT?

Also, we definitely don't want to check (compute) the duality gap every epoch, so it doesn't make sense to use it like this.

jolars commented 2 years ago

I changed the strategy now to just check the number of clusters. If the number of clusters do not change between two PGD epochs, then the PGD frequency is halved (pgd_freq is doubled). It seems to work better. The gains are not very large, but that's not surprising.

image

jolars commented 2 years ago

It would of course be possible to do more fancy checking of clusters to see if they've actually changed, but I figured that this would be unnecessarily expensive, but maybe not.

JonasWallin commented 2 years ago

I am not sure I completely understand what you would like to try? now we run pgd by frequency right? However, one could run the pgd only if a clusters joins with an other cluster, or if a cluster jumps over an other cluster in magnitude of the coefficient. This does not guarantee convergence (or rather we can be stuck in a subproblem).

I am not sure I completely understand what you would like to try?

This is basically what I'm trying here: #22

I was thinking that this is actually a subproblem we solve (conditioning on no clustering or cluster jumps) so in this case one should just look at the subproblem which is the weighted lasso?

jolars commented 2 years ago

I was thinking that this is actually a subproblem we solve (conditioning on no clustering or cluster jumps) so in this case one should just look at the subproblem which is the weighted lasso?

Oh you mean fix the clusters and just run weighted lasso at some point. I'm not sure there's much benefit to that actually. When the clusters do not change, the threshold updater should basically be equivalent to soft thresholding + checking direction and summing a subsequence of lambdas, which I don't think will be a bottle neck.