Add line search for PGD

jolars commented 2 years ago

This pull request adds a backtracking line search procedure for proximal gradient descent.

mathurinm commented 2 years ago

@jolars I see almost no difference with and without line search:

I would expect LS to do a big difference for PGD. Is it normal ?

mathurinm commented 2 years ago

Got it. In the experiment the true L is 7.82 and with our LS strategy, the first L giving a valid step is 8, that we never decrease afterwards. Resetting L to 1 at iterations is much better:

But 1 could still be larger than the true L, so this is not a robust choice. Let me try a BB rule stepsize.

mathurinm commented 2 years ago

Here's the result with

setting the candidate L to 1 at each new iteration (LS)
setting the candidate L to the BB estimate (LSBB)

I have put some print to check what the LSBB outputs as candidate for L. It is weird to me that it varies that much from one iteration to another. Any idea @jolars ?

iter 857
BB guess: 1.3306334061460234
L taken: 1.3306334061460234
iter 858
BB guess: 0.01572444409933038
L taken: 0.01572444409933038
iter 859
BB guess: 0.025714101634463915
L taken: 0.10285640653785566
iter 860
BB guess: 3.766654014877878
L taken: 3.766654014877878
Epoch: 861, loss: 19.784095140953134, gap: 1.70e-07
iter 861
BB guess: 3.8571876660201605
L taken: 3.8571876660201605
iter 862
BB guess: 2.5395177773296265
L taken: 2.5395177773296265

jolars commented 2 years ago

Right. Also, is it possible that make_simulated_data() and real-sim just work better because of how the data is simulated?

Anyway, how about something like this:

            if it % 10 == 0:
                L *= 0.9

jolars commented 2 years ago

I don't know anything about Barzilai-Borwein. Is it robust? I briefly read a little in https://arxiv.org/abs/1910.07056 and it seems like it may need some babysitting?

mathurinm commented 2 years ago

Intuitively what I would try to do is:

Start at previous L (first L: 1)
if condition is satisfied: 
    multiply by 0.9 as long as it is satisfied
else:
   multiply by 2 until it is satisfied.

what do you think ?

I don't know much BB, but I have heard it was very good in practice. For quadratics I think there are convergence guarantees.

jolars commented 2 years ago

Intuitively what I would try to do is:
Start at previous L (first L: 1)
if condition is satisfied: 
    multiply by 0.9 as long as it is satisfied
else:
   multiply by 2 until it is satisfied.
what do you think ?

I don't know much BB, but I have heard it was very good in practice. For quadratics I think there are convergence guarantees.

Sure, I like the general idea, but what exactly do you mean by condition here?

mathurinm commented 2 years ago

The one we have now:

                if q >= f * (1 - 1e-12):
                    break

jolars commented 2 years ago

Hm, seems like a clever approach, but I wonder if it's worth it computing all these inner products. Hm, let me try to benchmark a few alternatives.

jolars commented 2 years ago

Hm, I tested some strategies. See the code for details. It looks a little weird, maybe I messed something up.

jolars commented 2 years ago

I've tried to clean this pull request up a bit and make it ready to be merged. I've reduced the vanilla line search strategies to a single one and simplified the interface a bit. I've also added a safeguarding scheme from https://arxiv.org/abs/1910.07056 for the BB step size rule. Here are some results on real data sets:

My opinion is that we only need to include a maximum of three of these for the paper, and I would probably choose FISTA+line search, PGD+LS+BB, and maybe Anderson?

jolars commented 2 years ago

Do we want to bring this in or are we happy as it is? I think maybe we should just leave it be and avoid the additional code bloat.

Klopfe commented 2 years ago

Do we want to bring this in or are we happy as it is? I think maybe we should just leave it be and avoid the additional code bloat.

I agree. Let's just leave it

jolars / slopecd

Add line search for PGD #15