rfeinman / pytorch-minimize

Newton and Quasi-Newton optimization with PyTorch
https://pytorch-minimize.readthedocs.io
MIT License
292 stars 34 forks source link

L-BFGS errors out on certain problems #8

Closed calvinmccarter closed 2 years ago

calvinmccarter commented 2 years ago

The following problem succeeds with other methods (eg 'cg'), but fails with 'l-bfgs'. However, the error seems to arise in the PyTorch backend, so perhaps I should instead file this as an issue with PyTorch.

import torch
import torchmin as tm

def myfun(xy):
    x, y = xy[0], xy[1]
    obj = (
        -0.13784 * torch.log(x)
        + 143.42357 * (x**2)
        + 23.90574 * x * y
        - 291.90029 * x
        + 1.0 * (y**2)
        - 24.41997 * y
    )    
    return obj
xy_init = torch.tensor([1.0, 0.0])
res = tm.minimize(
    myfun,
    xy_init,
    method="l-bfgs",
    max_iter=10,
    disp=3,
)    
a, b = res.x.numpy()
print(a, b)

The error output is:

IndexError                                Traceback (most recent call last)
Input In [1], in <module>
     15     return obj
     16 xy_init = torch.tensor([1.0, 0.0])
---> 17 res = tm.minimize(
     18     myfun,
     19     xy_init,
     20     method="l-bfgs",
     21     max_iter=10,
     22     disp=3,
     23 )    
     24 a, b = res.x.numpy()
     25 print(a, b)

File ~/sandbox/pytorch-minimize/torchmin/minimize.py:91, in minimize(fun, x0, method, max_iter, tol, options, callback, disp, return_all)
     89     return _minimize_bfgs(fun, x0, **options)
     90 elif method == 'l-bfgs':
---> 91     return _minimize_lbfgs(fun, x0, **options)
     92 elif method == 'cg':
     93     return _minimize_cg(fun, x0, **options)

File ~/sandbox/pytorch-minimize/torchmin/bfgs.py:381, in _minimize_lbfgs(fun, x0, lr, history_size, max_iter, line_search, gtol, xtol, normp, callback, disp, return_all)
    337 def _minimize_lbfgs(
    338         fun, x0, lr=1., history_size=100, max_iter=None,
    339         line_search='strong-wolfe', gtol=1e-5, xtol=1e-9,
    340         normp=float('inf'), callback=None, disp=0, return_all=False):
    341     """Minimize a multivariate function with L-BFGS
    342 
    343     Parameters
   (...)
    379         Result of the optimization routine.
    380     """
--> 381     return _minimize_bfgs_core(
    382         fun, x0, lr, low_mem=True, history_size=history_size,
    383         max_iter=max_iter, line_search=line_search, gtol=gtol, xtol=xtol,
    384         normp=normp, callback=callback, disp=disp, return_all=return_all)

File ~/miniconda3/envs/condo/lib/python3.8/site-packages/torch/autograd/grad_mode.py:28, in _DecoratorContextManager.__call__.<locals>.decorate_context(*args, **kwargs)
     25 @functools.wraps(func)
     26 def decorate_context(*args, **kwargs):
     27     with self.__class__():
---> 28         return func(*args, **kwargs)

File ~/sandbox/pytorch-minimize/torchmin/bfgs.py:219, in _minimize_bfgs_core(fun, x0, lr, low_mem, history_size, inv_hess, max_iter, line_search, gtol, xtol, normp, callback, disp, return_all)
    215     f_new, g_new, _, _ = closure(x_new)
    216 elif line_search == 'strong-wolfe':
    217     #  Determine step size via strong-wolfe line search
    218     f_new, g_new, t, ls_evals = \
--> 219         strong_wolfe(dir_evaluate, x, t, d, f, g, gtd)
    220     x_new = x + d.mul(t)
    221 else:

File ~/sandbox/pytorch-minimize/torchmin/line_search.py:183, in strong_wolfe(fun, x, t, d, f, g, gtd, **kwargs)
    176     f, g, t, ls_nevals = _strong_wolfe_extra(
    177         fun, x.view(-1), t, d.view(-1), f, g.view(-1), gtd, **kwargs)
    178 else:
    179     # in theory we shouldn't need to use pytorch's native _strong_wolfe
    180     # since the custom implementation above is equivalent with
    181     # extra_codition=None. But we will keep this in case they make any
    182     # changes.
--> 183     f, g, t, ls_nevals = _strong_wolfe(
    184         fun, x.view(-1), t, d.view(-1), f, g.view(-1), gtd, **kwargs)
    186 # convert back to torch scalar
    187 f = torch.as_tensor(f, dtype=x.dtype, device=x.device)

File ~/miniconda3/envs/condo/lib/python3.8/site-packages/torch/optim/lbfgs.py:177, in _strong_wolfe(obj_func, x, t, d, f, g, gtd, c1, c2, tolerance_change, max_ls)
    174         bracket_gtd[low_pos] = gtd_new
    176 # return stuff
--> 177 t = bracket[low_pos]
    178 f_new = bracket_f[low_pos]
    179 g_new = bracket_g[low_pos]

IndexError: list index out of range
rfeinman commented 2 years ago

@calvinmccarter Thanks for pointing this out - I get the same result on my machine and I'm trying to sort this out. At the moment I do not reproduce the error using pytorch's native L-BFGS algorithm (see below) so perhaps the issue is with torchmin, not torch.

Code:

xy = torch.tensor([1.0, 0.0], requires_grad=True)

optimizer = torch.optim.LBFGS([xy], 
                              line_search_fn='strong_wolfe', 
                              max_iter=10)

def closure():
    optimizer.zero_grad()
    loss = myfun(xy)
    loss.backward()
    return loss

loss = optimizer.step(closure)

print(loss)

Output:

tensor(-148.4767, grad_fn=<SubBackward0>)
rfeinman commented 2 years ago

@calvinmccarter - after a deeper look the issue here is your objective function.

Your objective function is only defined on the space of positive numbers (otherwise log(x) = nan). L-BFGS and the associated line search methods are not designed to handle constrained optimization problems like yours, so there is no guarantee that they will work properly. The fact that CG worked ok is just coincidence (you got lucky that the optimization path never crossed into the negative domain).

Perhaps we should build in a sanity check that looks for nans in the objective to help troubleshoot issues like these.

calvinmccarter commented 2 years ago

Yeah, I'm using the log as a log-barrier. Fwiw, the problem is fixed by using a custom autograd Function that returns -inf, as below:

class MyLog(torch.autograd.Function):
    @staticmethod
    def forward(ctx, inp):
        result = torch.log(inp)
        result[inp < 0] = float('-inf')
        ctx.save_for_backward(inp)
        return result

    def backward(ctx, grad_output):
        inp, = ctx.saved_tensors
        grad_input = grad_output / inp
        grad_input[inp < 0] = float('inf')
        return grad_input

def myfun(xy):
    x, y = xy[0], xy[1]
    obj = (
        round(r_1, 5) * MyLog.apply(x)
        + round(r_2+r_3, 5) * (x**2)
        + round(r_4, 5) * x * y
        + round(r_5, 5) * x
        + round(r_6, 5) * (y**2)
        + round(r_7, 5) * y
    )    
    return obj

Arguably, there is a bug in torch/optim/lbfgs.py:

low_pos, high_pos = (0, 1) if bracket_f[0] <= bracket_f[-1] else (1, 0)

Their Strong Wolfe zoom phase chooses (low_pos, high_pos) = (1, 0) even when (bracket_f[0], bracket_f[-1]) = (finite_number, nan). I'll create an issue with torch.