implement strong_wolfe in optim_lbfgs, and related considerations

skeydan commented 3 years ago

I was comparing R and Python behavior on an optimization problem (an outcome of this is https://github.com/mlverse/torch/pull/515/files).

But the swallowed error aside, this problem really impressively demonstrates the importance of line search in some cases - compare (iteration 1 in both cases, for R, loss will be Inf thereafter):

R:

Step:  1 
x:  20 20 
Loss:  28.28427 
x:  19.42929 19.57071 
Loss:  27.56284 
x:  18.62183 18.96322 
Loss:  26.54142 
x:  17.81386 18.3551 
Loss:  25.51836 
x:  17.00539 17.74621 
Loss:  24.49352 
x:  16.19644 17.13639 
Loss:  23.46671 
x:  15.38704 16.52541 
Loss:  22.43772 
x:  14.57726 15.91304 
Loss:  21.40634 
x:  13.76719 15.29895 
Loss:  20.37234 
x:  12.957 14.68274 
Loss:  19.33546 
x:  12.14691 14.06392 
Loss:  18.29546 
x:  11.33727 13.44186 
Loss:  17.2521 
x:  10.52855 12.81576 
Loss:  16.20516 
x:  9.721455 12.18463 
Loss:  15.15449 
x:  8.91698 11.5472 
Loss:  14.10007 
x:  8.116535 10.90186 
Loss:  13.04206 
x:  7.322118 10.2466 
Loss:  11.98094 
x:  6.536536 9.578884 
Loss:  10.91764 
x:  5.763702 8.895616 
Loss:  9.853705 
x:  -1949.697 -1773.551 
Loss:  2635.866 
After optimizer step: x:  -985.4553 -897.5367

Python:

y:  28.284271240234375
x:  [19.429288864135742, 19.570711135864258]
y:  27.562843322753906
x:  [16.858238220214844, 17.636762619018555]
y:  24.307722091674805
x:  [2.7074530124664307, 6.992524147033691]
y:  6.502713203430176
x:  [-136.89385986328125, -141.30908203125]
y:  196.68052673339844
x:  [-16.822242736816406, -13.754308700561523]
y:  22.11886978149414
x:  [-1.3870973587036133, 2.642791509628296]
y:  3.9196858406066895
x:  [-40.89145278930664, -58.53330993652344]
y:  70.7560043334961
x:  [-6.983370780944824, -6.023548603057861]
y:  9.51268196105957
x:  [-2.514030933380127, 0.897631824016571]
y:  1.6892166137695312
x:  [-0.7226543426513672, 0.21777576208114624]
y:  -0.16630667448043823
x:  [293.8512268066406, -87.08942413330078]
y:  305.5713195800781
x:  [40.789024353027344, -12.085652351379395]
y:  41.6281623840332
x:  [5.272896766662598, -1.5592139959335327]
y:  4.585817813873291
x:  [0.3209821581840515, -0.09154215455055237]
y:  -0.5624837875366211
x:  [0.030059879645705223, -0.005317231640219688]
y:  -0.6139265894889832
x:  [0.1282764971256256, -0.034427136182785034]
y:  -0.7340162992477417
x:  [6.4074225425720215, -2.595301866531372]
y:  5.913571834564209
x:  [0.9565708041191101, -0.37223705649375916]
y:  0.03017193078994751
x:  [0.25083452463150024, -0.08441096544265747]
y:  -0.698489785194397
x:  [0.16349004209041595, -0.048788558691740036]
y:  -0.7461978793144226
x:  [-0.1562851518392563, 0.02214648202061653]
y:  -0.3759414553642273
x:  [0.12250158935785294, -0.03969617933034897]
y:  -0.8213021755218506
x:  [0.08982158452272415, -0.03244684264063835]
y:  -0.8875812292098999
x:  [0.06471659243106842, -0.026877857744693756]
y:  -0.9299168586730957

Is there a specific reason we have not implemented strong_wolfe? If no, I can create a PR.

Additionally, I think it would also be great to port [parts of] https://github.com/hjmshi/PyTorch-LBFGS, which - among others - has a few additional line search functions that work with larger datasets. I think we want to have an extra package (like torchoptimizers), right? Is it a lot of effort to make our optimizer class extendable by other packages?

dfalbel commented 3 years ago

Is there a specific reason we have not implemented strong_wolfe? If no, I can create a PR.

No, I think I just wanted to have optim_lbfgs as fast as possible.

Is it a lot of effort to make our optimizer class extendable by other packages?

That should be fast, I can do that in the next few days.

dfalbel commented 3 years ago

Fixed in #517

mlverse / torch

implement strong_wolfe in optim_lbfgs, and related considerations #516