stevengj / nlopt

library for nonlinear optimization, wrapping many algorithms for global and local, constrained or unconstrained, optimization
Other
1.89k stars 590 forks source link

SLSQP fails for high condition number quadratic function #86

Open Alaya-in-Matrix opened 8 years ago

Alaya-in-Matrix commented 8 years ago

I am benchmarking the SLSQP algorithm with a simple 10D quadratic function:

f(x) = x_1^2 + 1e6\sum\limits_{i=2}^{D}x_i^2
double myfunc(const std::vector<double> &x, std::vector<double> &grad, void *)
{
    ++counter;
    assert(x.size() == 10);
    double y      = pow(x[0], 2);
    double factor = 1e6;

    for (size_t i = 1; i < 10; ++i) 
        y += factor * pow(x[i], 2);
    if (!grad.empty())
    {
        assert(grad.size() == 10);
        grad[0] = 2 * x[0];
        for (size_t i = 1; i < 10; ++i)
            grad[i] = 2 * factor * x[i];
    }
    return y;
}

To my suprise, SLSQP fails for this simple function, but if I switch algorithm to nlopt::LD_LBFGS, nlopt still optimize the function efficiently.

The version of NLOPT is 2.4.2(given by nlopt::version), below is the full code:

#include <cassert>
#include <cmath>
#include <iostream>
#include <nlopt.hpp>
#include <random>
#include <vector>
using namespace std;
mt19937_64 engine(random_device{}());
static int counter = 0;
double myfunc(const std::vector<double> &x, std::vector<double> &grad, void *)
{
    ++counter;
    assert(x.size() == 10);
    double y      = pow(x[0], 2);
    double factor = 1e6;

    for (size_t i = 1; i < 10; ++i) 
        y += factor * pow(x[i], 2);
    if (!grad.empty())
    {
        assert(grad.size() == 10);
        grad[0] = 2 * x[0];
        for (size_t i = 1; i < 10; ++i)
            grad[i] = 2 * factor * x[i];
    }
    return y;
}
int main()
{
    nlopt::opt opt(nlopt::LD_LBFGS, 10);

    std::vector<double> lb(10, -100);
    std::vector<double> ub(10, 100);
    opt.set_lower_bounds(lb);
    opt.set_upper_bounds(ub);
    opt.set_min_objective(myfunc, NULL);
    opt.set_maxeval(1000);
    // opt.add_inequality_constraint(myfunc_constr, nullptr, 1);

    vector<double> x(10);
    uniform_real_distribution<double> distr(-100, 100);
    for (auto &vx : x) 
        vx = distr(engine);
    double minf = 1e20;
    vector<double> fake_grad;
    cout << "f(x0) =    " << myfunc(x, fake_grad, nullptr) << endl;

    try
    {
        counter = 0;
        nlopt::result result = opt.optimize(x, minf);
        cout << "exit val:  " << result << endl;
    }
    catch (runtime_error &err)
    {
        cerr << err.what() << endl;
    }
    cout << endl;
    cout << "optimized: " << minf << endl;
    cout << "counter:   " << counter << endl;
    cout << opt.get_algorithm_name() << endl;

    int maj, min, bugf;
    nlopt::version(maj, min, bugf);
    cout << "version: " << maj << "." << min << "." << bugf << endl;
    return 0;
}
kkofler commented 1 month ago

In fact, SLSQP fails whenever the gradients are large. Even if the function is linear. (But of course also for non-linear, non-quadratic functions.)

See also those old mailing list threads from 2011:

I have converted Alexander Riess's example from half Python, half pseudocode to compilable C (riesstest.c):

#include <math.h>
#include <nlopt.h>
#include <stdio.h>

#define s 1e8

double myfunc(unsigned n, const double *x, double *grad, void *my_func_data)
{
    if (grad) {
        grad[0] = -s;
    }
    return -s*x[0];
}

int main(void) {
  nlopt_opt opt = nlopt_create(NLOPT_LD_SLSQP, 1);
  nlopt_set_lower_bounds1(opt, -1.);
  nlopt_set_upper_bounds1(opt, 1.);
  nlopt_set_min_objective(opt, myfunc, NULL);
  nlopt_set_xtol_abs1(opt, 1e-10);
  nlopt_set_xtol_rel(opt, 1e-10);
  nlopt_set_ftol_abs(opt, 1e-10);
  nlopt_set_ftol_rel(opt, 1e-10);
  double x[] = {0.};
  double minf;
  int status = 0;
  if (nlopt_optimize(opt, x, &minf) < 0) {
    printf("nlopt failed!\n");
    status = 1;
  }
  else {
    printf("found minimum at f(%g) = %0.10g\n", x[0], minf);
  }
  nlopt_destroy(opt);
  return status;
}

and added it to the tests CMakeLists.txt:

add_executable (riesstest riesstest.c)
target_link_libraries (riesstest ${nlopt_lib})
target_include_directories (riesstest PRIVATE ${NLOPT_PRIVATE_INCLUDE_DIRS})
add_dependencies (tests riesstest)

so I was able to run the thing through a debugger.

I tracked down the issue to a defect in LSEI which seems to also have been known for ages, see:

Where it all breaks down is this line in the subroutine ldp_:

fac = one - ddot_sl__(m, &h__[1], 1, &w[iy], 1);

(which computes the inner product between the input vector H (h__) and the solution from nnls_ (&w[iy])). The resulting fac must not be zero (and in fact fac+one must not round to one, this is checked right below the above line, though even relaxing that check would not fix it, see the explanation below). But in the Riess test case, fac is approximately 1/s², e.g., for s=1e3, fac is around 1e-6. For s=1e8, fac is around 1e-16 and completely eaten up by cancellation, resulting in a zero fac (exactly zero, so even checking only for exactly zero would not fix anything) and an error.

In addition, the solution nnls_ returns also hits roundoff limits for s=1e8: the second component is 1.0000000100000002e-08, and that 2 is where digits we would actually need start. So even if we somehow manage to compute fac exactly, it would still fail because the &w[iy] computed by nnls_ is already destroyed by roundoff cancellation.

A workaround I have found to work in my application, where I am trying to minimize Gaussian Mixture Models (GMMs), where the large gradients are not actually at the optimum, is to lie to SLSQP about the gradients, scaling them down to an infinity norm of 1000 if their actual infinity norm is larger:

    /* if the gradient is too large, lie, or SLSQP will fail */
    double scale=1.;
    int i;
    for (i=0; i<DIMX; i++) {
/* for some reason, a non-power-of-2 works better here */
#define GRAD_MAX 1000.
      if (fabs(scale*grad_f[i]) > GRAD_MAX) {
        scale=GRAD_MAX/grad_f[i];
      }
    }
    if (scale<1.) {
      for (i=0; i<DIMX; i++) {
        grad_f[i]*=scale;
      }
    }

As I said, this works for me in my application. It will not necessarily do something reasonable in other applications, especially if they have huge gradients everywhere, including at the optimum. Though it seems to also work at least in the Riess test, where the above can be simplified to:

if (grad[0] < -1000.) grad[0] = -1000.;

But it may or may not work in other applications.

kkofler commented 1 month ago

I think ultimately the only way to solve this might be to use a different QP solver instead of the bundled LSEI code. The original SLSQP code recommended QPSOL, but that is a proprietary code that is not publicly available. But the Apache-licensed OSQP might be an option nowadays.

kkofler commented 4 weeks ago

Reading the description of the LSEI algorithms (in particular, the LSQ algorithm), I have since realized that the failure to compute fac can theoretically be fixed by replacing:

    fac = one - ddot_sl__(m, &h__[1], 1, &w[iy], 1);
    d__1 = one + fac;
    if (d__1 - one <= 0.0) {
    goto L50;
    }

with:

    fac = rnorm * rnorm;

(because what is computed in the first snippet is -r{n+1}, which is actually proven to be ||r||², and the rnorm returned by `nnls` is fairly accurate even in the ill-conditioned cases), but then the algorithm still breaks down because of rounding errors in some other step(s) of the LSEI procedure. (Maybe the computation of r_1 to r_n and x_1 to x_n below in the same function? I am not sure yet where the problem(s) is/are.)

kkofler commented 3 weeks ago

The reason the algorithm still breaks down even if I change the computation of fac as above is that the solution from nnls_ is just too inaccurate to obtain an accurate ldp_ solution (the relative errors are around 1e-8), then the back-transformation done in lsi_ (daxpy_sl__(n, &one, &f[1], 1, &x[1], 1);) ends up canceling out all the accurate digits (because f is 1e8 and x is supposed to be 1-1e8, but actually ends up around -4-1e8).

The main issue with the Lawson-Hanson least-squares routines bundled in SLSQP is that they have to transform the problem into a simpler problem step by step. LSEI ends up doing 3 transformations: LSEI→LSI→LDP→NNLS. Each of the transformations worsens the condition of the problem. Ultimately, the transformations done seem (empirically – I have not done a formal error analysis) to end up squaring the condition number, i.e., double precision with its relative accuracy of approximately 2e-16 ends up insufficient for a gradient 1e8.

kkofler commented 3 weeks ago

In https://github.com/kkofler/nlopt/commits/slsqp-osqp/ I have implemented an experimental OSQP backend for SLSQP.

Currently, the backend (OSQP or LSEI) has to be selected at compile time. It should probably be selectable as a parameter to SLSQP, if OSQP is compiled in at all.

To compile with OSQP support, you currently need to have OSQP compiled separately and installed somewhere. I have not bundled it.

With OSQP enabled, SLSQP now passes the existing tests plus the Alexander Riess test above up to s=1e31 (s=1e32 fails even with OSQP).

I have not yet tried it in my application.

kkofler commented 3 weeks ago

Note that I am using OSQP to solve the factored least-squares form of the QP, brought into QP form by a variable transformation as per: https://osqp.org/docs/release-0.6.3/examples/least-squares.html

OSQP would also be able to solve the expanded QP form minimizing (EX-F)'(EX-F)=X'E'EX-2(E'F)'X+F'F, where the constant term can be ignored, E'E=L√D√DL'=LDL'=H and E'F=L√D(-inv(√D))inv(L)G=-G. Though lsq_ gets passed H only in factored form, so one would have to multiply it back up to H, and then OSQP will factor it again.

kkofler commented 2 weeks ago

Be warned that the OSQP backend linked above is still experimental. I have run into problems where OSQP simply will not converge (and my attempts at tweaking the OSQP settings have been unsuccessful so far).

kkofler commented 2 weeks ago

To repair the Lawson-Hanson algorithm, we would probably need to introduce scaling factors β and γ as in Section III of http://cse.lab.imtlucca.it/~bemporad/publications/papers/ieeetac-proxnnls.pdf – however, the step-by-step transformation from QP to NNLS (where SLSQP starts from the LSEI step) is sufficiently different from Bemporad's one-step transformation that I am not sure how to introduce those parameters correctly.