need support for soft feasibility constraints

jpritikin commented 9 years ago

At least for local gradient-based methods, OpenMx needs support for soft feasibility constraints expressed by returning NaN for the fit. We use these in situations where the constraint is not expected to be active at the MLE. For example, a model may assume a positive definite covariance matrix and return NaN when the matrix is not positive definite. It is probably possible to add an inequality constraint to ensure positive definiteness, but any plausible MLE will be positive definite. Therefore, it is much easier to score such points as NaN.

Another example is when we estimate confidence intervals using a profile likelihood. The fit function when estimating the confidence interval includes a squared distance from a target log likelihood value. Here we use a soft constraint such that large square distances (>100) are scored as NaN. The distance at the MLE should be zero so this constraint is never active at a valid MLE.

Our proprietary optimizer, NPSOL, can handle non-linear constraints but it also knows that we use NaN to express soft constraints. When NPSOL evaluates the fit and gets a NaN, it halves the step size and tries again. If this continues to fail after a few retries then it gives up. One way to implement a similar kind of functionality using NLOPT is to return std::numeric_limits::max(). However, NLOPT takes this fit at face value and it screws up the Hessian approximation (tested with SLSQP).

Is there some existing functionality to handle this kind of situation in NLOPT? Or are you willing to accept patches implementing it?

stevengj commented 9 years ago

This will depend a lot on the specific algorithm. Algorithms that are gradient-based (or which internally construct an approximate gradient, like COBLYA) will have a hard time with something like this—they need differentiability, not to mention continuity. Algorithms which don't exploit continuity, like ISRES or DIRECT, could definitely be modified to handle this, but those algorithms tend to converge much more slowly.

In your example of squared distance, the simplest thing would just to be to add a distance² ≤ 100 inequality constraint; I don't see why you need NaNs for this.

The positive-definiteness constraint is a bit harder to implement, because eigenvalues are not generally differentiable. (SDP solvers handle positive-definiteness constraints, of course, but they require specialized methods.) In that case, couldn't you just as easily write your matrix as A = B' * B and then put your degrees of freedom in B rather than A? That would guarantee a positive semidefinite A implicitly with no loss of generality.

jpritikin commented 9 years ago

Algorithms that are gradient-based (or which internally construct an approximate gradient, like COBLYA) will have a hard time with something like this—they need differentiability, not to mention continuity.

I am not sure about COBLYA, but using NaN to signal a soft constraint is no problem BFGS or L-BFGS style algorithms. There is no expectation that the gradient can be evaluated close to the feasibility boundary. These soft constraints are not expected to be active near the MLE so this is not a problem in practice.

I don't see why you need NaNs for this.

Certainly it is possible to come up with real constraints that would have the same effect as our implicit soft constraints. The problem is that OpenMx has offered soft constraints for years. We need to maintain backward compatibility with modelling scripts written by a broad user community of researchers. We would really prefer not to fork NLOPT and make our own version just for this small feature.

mhunter1 commented 9 years ago

What we really need is some method of telling one of NLOPT's optimizers (e.g. SLSQP, or L-BFGS) that it has wondered into a bad region of the solution space. For example, many of the models OpenMx runs use some variety of covariance matrix which is assumed to be positive definite (PD). We do not currently use constraints to ensure the covariance matrix stays PD. Rather, at various points we attempt a Cholesky decomposition of the covariance matrix and if it fails then we know it is not PD. When the covariance matrix is not PD, we signal NPSOL that the attempted solution vector (i.e. the current values of the estimated parameters) is definitely not right. The signal that works for NPSOL is a fit function value of NaN. When we give NPSOL a fit function value of NaN, it tries a smaller step in the same direction. Is there a similar signal we could send to an NLOPT optimizer?

Joshua has tried NaN, but this seems to kill NLOPT. He's also tried std::numeric_limits::max(), but NLOPT then takes that ridiculous number literally and it screws up the derivative estimates. Thanks for your help on this!

jpritikin commented 9 years ago

For example, many of the models OpenMx runs use some variety of covariance matrix which is assumed to be positive definite (PD). We do not currently use constraints to ensure the covariance matrix stays PD.

I want to point out that we have no control over the model specification. Many of these models assume the behavior of our old proprietary NPSOL optimizer (a.k.a. e04ucf). Many of these models are published as supplementary online material in academic journals. We are trying to find some open-source optimizer that has equal or better performance compared to NPSOL on the same optimization problems so we can release OpenMx on CRAN.

stevengj commented 9 years ago

It's true that on algorithms that use line search with backtracking, you can just backtrack when a NaN is encountered. I'd be open to patches of this sort.

stevengj commented 9 years ago

Actually, it looks like some of the algorithms will work okay if you just return Inf if it strays into an infeasible region.

jpritikin commented 9 years ago

Can you be more specific? Which algorithms did you examine?

stevengj commented 9 years ago

CCSA at least, I think. Also some of the derivative-free methods that don't "in their hearts" construct a derivative, like Subplx or DIRECT or ISRES or CRS.

jpritikin commented 9 years ago

I'm curious to try this out. I tried with CCSA. Any idea what to try here to get it working?

fit: 1673.69
fit: 1673.69
grad:
-666.478
-930.557
-1789.43
CCSA dual converged in 2 iters to g=-6.77293e+23:
fit: nan
CCSA inner iteration: rho -> 10
CCSA dual converged in 2 iters to g=1673.69:
fit: nan
CCSA inner iteration: rho -> 100
CCSA dual converged in 2 iters to g=1673.69:
fit: nan
CCSA inner iteration: rho -> 1000
CCSA dual converged in 2 iters to g=1673.69:
fit: nan
CCSA inner iteration: rho -> 10000
CCSA dual converged in 2 iters to g=1673.69:
fit: nan
CCSA inner iteration: rho -> 100000
CCSA dual converged in 2 iters to g=1673.69:
fit: nan
CCSA inner iteration: rho -> 1e+06

The diagnostics say "nan" but I return infinity as the fit.

stevengj commented 9 years ago

@jpritikin this will only work if it's not an active constraint at the optimum, I think.

jpritikin commented 9 years ago

this will only work if it's not an active constraint at the optimum, I think.

Of course. We can specify explicit constraints in OpenMx for constraints that might be active at the optimum.

Can I take some point between xprev and xcur if xcur doesn't work?

stevengj commented 9 years ago

It should be enough to just increase rho by a factor of 10 and run another inner iteration.

jpritikin commented 9 years ago

It should be enough to just increase rho by a factor of 10 and run another inner iteration.

As you can see from the output, rho keeps going up and up with no progress. If I let it run, it does 100s of iterations like that.

stevengj commented 9 years ago

That's odd; with a larger rho, it should take smaller steps, from what I recall; I wonder why it is giving the same g? Can you print out where you objective is being evaluated and what it is returning?

jpritikin commented 9 years ago

Looks like it jumps to HUGE_VAL and stays there,

Running Univariate Regression of y on x1
coord: 0.2
0.8
0.8
fit: 1673.69
fit: 1673.69
grad:
-666.478
-930.557
-1789.43
CCSA dual converged in 2 iters to g=-6.77293e+23:
coord: 2e+20
2e+20
2e+20
fit: nan
CCSA inner iteration: rho -> 10
CCSA dual converged in 2 iters to g=1673.69:
coord: 2e+20
2e+20
2e+20
fit: nan
CCSA inner iteration: rho -> 100

jpritikin commented 9 years ago

FYI, I pushed everything to https://github.com/jpritikin/OpenMx/tree/nlopt

After make install, you could run this yourself with,

IMX_OPT_ENGINE=NLOPT R --no-save -f inst/models/passing/IntroSEM-UnivariateStd.R

I copied the necessary nlopt source code directly into the OpenMx project just to ease debugging.

jpritikin commented 8 years ago

Steven, do you think BOBYQA could be modified to cope with soft feasibility constraints?

jpritikin commented 4 years ago

Yo, anybody home? Can we get my pull request merged?

stevengj / nlopt

need support for soft feasibility constraints #30