SLSQP fails for high condition number quadratic function

Alaya-in-Matrix commented 8 years ago

I am benchmarking the SLSQP algorithm with a simple 10D quadratic function:

f(x) = x_1^2 + 1e6\sum\limits_{i=2}^{D}x_i^2

double myfunc(const std::vector<double> &x, std::vector<double> &grad, void *)
{
    ++counter;
    assert(x.size() == 10);
    double y      = pow(x[0], 2);
    double factor = 1e6;

    for (size_t i = 1; i < 10; ++i) 
        y += factor * pow(x[i], 2);
    if (!grad.empty())
    {
        assert(grad.size() == 10);
        grad[0] = 2 * x[0];
        for (size_t i = 1; i < 10; ++i)
            grad[i] = 2 * factor * x[i];
    }
    return y;
}

To my suprise, SLSQP fails for this simple function, but if I switch algorithm to nlopt::LD_LBFGS, nlopt still optimize the function efficiently.

The version of NLOPT is 2.4.2(given by nlopt::version), below is the full code:

#include <cassert>
#include <cmath>
#include <iostream>
#include <nlopt.hpp>
#include <random>
#include <vector>
using namespace std;
mt19937_64 engine(random_device{}());
static int counter = 0;
double myfunc(const std::vector<double> &x, std::vector<double> &grad, void *)
{
    ++counter;
    assert(x.size() == 10);
    double y      = pow(x[0], 2);
    double factor = 1e6;

    for (size_t i = 1; i < 10; ++i) 
        y += factor * pow(x[i], 2);
    if (!grad.empty())
    {
        assert(grad.size() == 10);
        grad[0] = 2 * x[0];
        for (size_t i = 1; i < 10; ++i)
            grad[i] = 2 * factor * x[i];
    }
    return y;
}
int main()
{
    nlopt::opt opt(nlopt::LD_LBFGS, 10);

    std::vector<double> lb(10, -100);
    std::vector<double> ub(10, 100);
    opt.set_lower_bounds(lb);
    opt.set_upper_bounds(ub);
    opt.set_min_objective(myfunc, NULL);
    opt.set_maxeval(1000);
    // opt.add_inequality_constraint(myfunc_constr, nullptr, 1);

    vector<double> x(10);
    uniform_real_distribution<double> distr(-100, 100);
    for (auto &vx : x) 
        vx = distr(engine);
    double minf = 1e20;
    vector<double> fake_grad;
    cout << "f(x0) =    " << myfunc(x, fake_grad, nullptr) << endl;

    try
    {
        counter = 0;
        nlopt::result result = opt.optimize(x, minf);
        cout << "exit val:  " << result << endl;
    }
    catch (runtime_error &err)
    {
        cerr << err.what() << endl;
    }
    cout << endl;
    cout << "optimized: " << minf << endl;
    cout << "counter:   " << counter << endl;
    cout << opt.get_algorithm_name() << endl;

    int maj, min, bugf;
    nlopt::version(maj, min, bugf);
    cout << "version: " << maj << "." << min << "." << bugf << endl;
    return 0;
}

kkofler commented 3 days ago

In fact, SLSQP fails whenever the gradients are large. Even if the function is linear. (But of course also for non-linear, non-quadratic functions.)

See also those old mailing list threads from 2011:

I have converted Alexander Riess's example from half Python, half pseudocode to compilable C (riesstest.c):

#include <math.h>
#include <nlopt.h>
#include <stdio.h>

#define s 1e8

double myfunc(unsigned n, const double *x, double *grad, void *my_func_data)
{
    if (grad) {
        grad[0] = -s;
    }
    return -s*x[0];
}

int main(void) {
  nlopt_opt opt = nlopt_create(NLOPT_LD_SLSQP, 1);
  nlopt_set_lower_bounds1(opt, -1.);
  nlopt_set_upper_bounds1(opt, 1.);
  nlopt_set_min_objective(opt, myfunc, NULL);
  nlopt_set_xtol_abs1(opt, 1e-10);
  nlopt_set_xtol_rel(opt, 1e-10);
  nlopt_set_ftol_abs(opt, 1e-10);
  nlopt_set_ftol_rel(opt, 1e-10);
  double x[] = {0.};
  double minf;
  int status = 0;
  if (nlopt_optimize(opt, x, &minf) < 0) {
    printf("nlopt failed!\n");
    status = 1;
  }
  else {
    printf("found minimum at f(%g) = %0.10g\n", x[0], minf);
  }
  nlopt_destroy(opt);
  return status;
}

and added it to the tests CMakeLists.txt:

add_executable (riesstest riesstest.c)
target_link_libraries (riesstest ${nlopt_lib})
target_include_directories (riesstest PRIVATE ${NLOPT_PRIVATE_INCLUDE_DIRS})
add_dependencies (tests riesstest)

so I was able to run the thing through a debugger.

I tracked down the issue to a defect in LSEI which seems to also have been known for ages, see:

Where it all breaks down is this line in the subroutine ldp_:

fac = one - ddot_sl__(m, &h__[1], 1, &w[iy], 1);

(which computes the inner product between the input vector H (h__) and the solution from nnls_ (&w[iy])). The resulting fac must not be zero (and in fact fac+one must not round to one, this is checked right below the above line, though even relaxing that check would not fix it, see the explanation below). But in the Riess test case, fac is approximately 1/s², e.g., for s=1e3, fac is around 1e-6. For s=1e8, fac is around 1e-16 and completely eaten up by cancellation, resulting in a zero fac (exactly zero, so even checking only for exactly zero would not fix anything) and an error.

In addition, the solution nnls_ returns also hits roundoff limits for s=1e8: the second component is 1.0000000100000002e-08, and that 2 is where digits we would actually need start. So even if we somehow manage to compute fac exactly, it would still fail because the &w[iy] computed by nnls_ is already destroyed by roundoff cancellation.

A workaround I have found to work in my application, where I am trying to minimize Gaussian Mixture Models (GMMs), where the large gradients are not actually at the optimum, is to lie to SLSQP about the gradients, scaling them down to an infinity norm of 1000 if their actual infinity norm is larger:

    /* if the gradient is too large, lie, or SLSQP will fail */
    double scale=1.;
    int i;
    for (i=0; i<DIMX; i++) {
/* for some reason, a non-power-of-2 works better here */
#define GRAD_MAX 1000.
      if (fabs(scale*grad_f[i]) > GRAD_MAX) {
        scale=GRAD_MAX/grad_f[i];
      }
    }
    if (scale<1.) {
      for (i=0; i<DIMX; i++) {
        grad_f[i]*=scale;
      }
    }

As I said, this works for me in my application. It will not necessarily do something reasonable in other applications, especially if they have huge gradients everywhere, including at the optimum. Though it seems to also work at least in the Riess test, where the above can be simplified to:

if (grad[0] < -1000.) grad[0] = -1000.;

But it may or may not work in other applications.

kkofler commented 3 days ago

I think ultimately the only way to solve this might be to use a different QP solver instead of the bundled LSEI code. The original SLSQP code recommended QPSOL, but that is a proprietary code that is not publicly available. But the Apache-licensed OSQP might be an option nowadays.

kkofler commented 1 day ago

Reading the description of the LSEI algorithms (in particular, the LSQ algorithm), I have since realized that the failure to compute fac can theoretically be fixed by replacing:

    fac = one - ddot_sl__(m, &h__[1], 1, &w[iy], 1);
    d__1 = one + fac;
    if (d__1 - one <= 0.0) {
    goto L50;
    }

with:

    fac = rnorm * rnorm;

(because what is computed in the first snippet is -r{n+1}, which is actually proven to be ||r||², and the rnorm returned by `nnls` is fairly accurate even in the ill-conditioned cases), but then the algorithm still breaks down because of rounding errors in some other step(s) of the LSEI procedure. (Maybe the computation of r_1 to r_n and x_1 to x_n below in the same function? I am not sure yet where the problem(s) is/are.)

kkofler commented 12 hours ago

The reason the algorithm still breaks down even if I change the computation of fac as above is that the solution from nnls_ is just too inaccurate to obtain an accurate ldp_ solution (the relative errors are around 1e-8), then the back-transformation done in lsi_ (daxpy_sl__(n, &one, &f[1], 1, &x[1], 1);) ends up canceling out all the accurate digits (because f is 1e8 and x is supposed to be 1-1e8, but actually ends up around -4-1e8).

The main issue with the Lawson-Hanson least-squares routines bundled in SLSQP is that they have to transform the problem into a simpler problem step by step. LSEI ends up doing 3 transformations: LSEI→LSI→LDP→NNLS. Each of the transformations worsens the condition of the problem. Ultimately, the transformations done seem (empirically – I have not done a formal error analysis) to end up squaring the condition number, i.e., double precision with its relative accuracy of approximately 2e-16 ends up insufficient for a gradient 1e8.

stevengj / nlopt

SLSQP fails for high condition number quadratic function #86