Numerical issues with Positivstellensatz and interpolant-based polynomial optimization

projekter commented 2 years ago

I observe some numerical difficulties in constrained polynomial optimization with small problems; and before scaling them up, I wanted to ask whether you'd expect such a behavior and the interpolant-based approach is not really suitable for what I want to do, or whether this actually indicates some problems in my formulation or even in the solver. I apologize in advance for the long text, but I hope this puts the problem in some context. If a new example arises from this or you want to include my code in some way, I'm happy to contribute.

I want to use Hypatia's WSOS capabilities to optimize over polynomials with constraints. As far as I can see, there are two options to do this (using the PolyUtils framework):

Write a custom Domain descendant which implements its own sampling and weights. This work great in principle, however, the sampling points should be chosen "well" and all the constraints evaluated at the sampling points must be nonnegative. For complicated constraint functions and/or lots of variables, it is not easy (or even possible) to do this.
Directly use a Positivstellensatz instead of the "native weighted" approach and employ WSOSInterpNonnegativeCone for the SOS polynomials that act as multipliers of the constraint polynomials. I would expect that this still scales better than the equivalent PSD matrix formulation.

I tried to apply both of the approaches to a very simple example (this was one of Lasserre's original examples when he introduced the moment problem),

maximize (x[1] - 1)² + (x[1] - x[2])² + (x[2] - 3)²
such that 1 - (x[1] -1)² ≥ 0,
          1 - (x[1] - x[2])² ≥ 0,
          1 - (x[2] -3)² ≥ 0

The region is a triangle with edges (1, 2), (2, 2), (2, 3), the maximum value is 2, which is attained at all three edges. The first approach with a custom domain works flawlessly. With my implementation of CustomBoxDomain, the domain would be defined like this (I choose a relatively loose enclosing rectangle)

dom = PolyUtils.CustomBoxDomain{Float64}(Float64[0, 1], Float64[3, 4], Function[
        (x1, x2) -> 1 - (x1 -1)^2, (x1, x2) -> 1 - (x1 - x2)^2,
        (x1, x2) -> 1 - (x2 -3)^2
    ], 2)

and then it's all standard interpolation and optimization. But since in this geometry, it is not really hard to take appropriate samples, this is perhaps not representative. My problem is with the second one.

Now, if I rewrite this problem in terms of Putinar's SOS decomposition,

minimize y
such that y - ( (x[1] - 1)² + (x[1] - x[2])² + (x[2] - 3)² )
            = SOS0 + ( 1 - (x[1] -1)² ) SOS1 + ( 1 - (x[1] - x[2])² ) SOS2 + ( 1 - (x[2] -3)² ) SOS3,
          SOS0 is a SOS-polynomial of total degree up to 2d,
          SOS1, SOS2, SOS3 are SOS-polynomials of total degree up to 2d -2

then with 2d = 2, I get the upper bound 3 and with 2d = 4, the exact maximum 2 is obtained, using the PSD matrix formulation (see this gist for solving the problem with arbitrary precision).

I would now expect that if I write the is a SOS-polynomial conditions in terms of the WSOSInterpNonnegativeCone cone, then this should give the same results. In order to get the appropriate P matrices on interpolation, I extended PolyUtils.interpolate a bit. Basically, I have to cut off more columns from the full matrix P0 in order to restrict the degree of my SOS polynomials. And then there are two choices: either I can still use the WSOS approach for SOS0...SOS3, so that they also contain the boundaries of the rectangle, or I just use the unweighted approach.

import Hypatia.PolyUtils
import Hypatia
import DynamicPolynomials
using JuMP

function evalPoly(poly)
    global pts, U, x
    [poly(x => pts[j, :]) for j in 1:U]
end;

n = 2;
d = 2;
DynamicPolynomials.@polyvar x[1:n];
dom = PolyUtils.BoxDomain{Float64}(Float64[0, 1], Float64[3, 4]);
(U, pts, Ps, constraint_Ps) = PolyUtils.interpolate(dom, d, constraint_degs=[2, 2, 2]);

obj = evalPoly((x[1] -1)^2 + (x[1] - x[2])^2 + (x[2] -3)^2);
gs = evalPoly.([1 - (x[1] -1)^2, 1 - (x[1] - x[2])^2, 1 - (x[2] -3)^2]);

Now constraint_Ps is a vector that contains the appropriate Ps matrices for SOS1...SOS3 (here, as they all have the same degree, it's just three times the same matrix). Basically constraint_Ps[1][1] is the same as Ps[1][:, 1:3] (corresponding to the degrees (0, 0), (1, 0), and (0, 1)). constraint_Ps[1][2] is based on constraint_Ps[1][1], but also multiplied by the square root of the box constraint (x[1] -0) * (3 - x[1]), so it just has a single column.

Now I form the model and optimize (here, I put SOS0 in the last column):

model = Model(Hypatia.Optimizer);
@variable(model, sos[1:U, 1:(length(gs) +1)]);
cone = Hypatia.WSOSInterpNonnegativeCone{Float64, Float64};
for i in 1:length(gs)
    @constraint(model, sos[:, i] in cone(U, constraint_Ps[i]));
end;
wsos = @constraint(model, sos[:, end] in cone(U, Ps));

@variable(model, y);
@constraint(model, y .- obj .== sum(sos[:, i] .* gs[i] for i in 1:length(gs)) .+
                                    sos[:, end]);

@objective(model, Min, y);
optimize!(model);
objective_value(model)

Now, this converges to the correct value in 231 iterations, with NearOptimal status and slow progress warning. The 6x6 moment matrix based on dual(wsos) in the monomial basis allows to extract the optimal points {{2., 3.},{2., 2.},{1., 2.}} using the Hankel decomposition method of Harmouch. Problem solved - but neither the large number of iterations for such a small problem nor the status and warning are very comforting.

I tried several variations of this:

My idea to also take into account the box constraint for SOS1...SOS3 may not be so good. It does not really correspond to the exact version of Putinar's, so I should perhaps not expect similar results. So in the constraints, I write instead constraint_Ps[i][1:1], just using the "unweighted" SOS. As a result, after 56 iterations, the status is SlowProgress and the result 5.3227 (dual 0.16003) is nowhere close to what it should be anymore (in fact, the objective meanders, getting smaller and larger). If I set tol_slow=1e-4, after 365 iterations, the optimal value is obtained (status Optimal) and all optimal points can be extracted.
The box constraints were not part of my SDP reformulation, so perhaps they mess up the comparison (actually, no: when I add them to the SDP, nothing changes). Hence, I not only write constraint_Ps[i][1:1], but also Ps[1:1]. Similarly, SlowProgress (primal 4.6338, dual 0.45718) after 108 iterations with no clear monotonicity between the iterations. If I set tol_slow=1e-6, after 290 iterations, the optimal value is obtained (status NearOptimal) and all optimal points can be extracted.
Last try, though I have no real justification for this other than that it is the remaining combination yet untested. Remove the weights from SOS0 (Ps[1:1]), but put them into SOS1...SOS3 (constraint_Ps[i]). This indeed converges to the correct value in 273 iterations with status Optimal and no slow progress warning. All optimal points can be extracted.

Can you see a reason why Hypatia struggles so much with this small problem despite the fact that the PSD matrix formulation is extremely simple? And then of course the main question: If the interpolation points cannot be constructed so easily (or, which is also a main motivation for me, I want to actually use a SOS-PSD matrix in the qmodule: p = SOS0 + sum(SOS[i] g[i]) + tr(SOSPSD G)) and therefore the "native" weights cannot be used - does it make sense to employ an interpolation basis approach instead of resorting to the much larger PSD formulation? In the end, all the potential gain may be negated if it comes at the pain of hundreds or thousands of iterations with unstable monotonicity on top.

lkapelevich commented 2 years ago

Hi @projekter. Thanks for such a detailed post- you've gone deep into the weeds investigating possible ways to get things to work!

I'll take a few days to ponder and get back

lkapelevich commented 2 years ago

or whether this actually indicates some problems in my formulation or even in the solver.

In summary, I don't see anything really bad about the models. The dual works a bit better for me. As you note, the interpolation-basis WSOS cone will work better if you can get "good" points from your domain and we are aware this can be tricky. For complicated domains, it's not clear what the best strategy is.

As far as I can see, there are two options to do this

Your summary is basically right. What you did for (1) is what I'd do. The ideal case is that you would know how to get points efficiently for your domain.

Regarding (2), I think a few things should be different:

My idea to also take into account the box constraint for SOS1...SOS3 may not be so good.

Right, those Ps shouldn't be multiplied by the sqrt of g and they should have 3 columns in your example. Actually your gs could be negative in this approach and taking the squareroot might not make sense. I tried to copy and update your formulation here. (Some stuff is hard-coded to avoid using your PolyUtils functions.) I observe the same things as you- many iterations and/or poor convergence.

I'll add a third option, which is to work with the dual. For the problem:

min_{y,p} y:
    y - f = p_0 + p_1 g_1 + p_2 g_2 + p_3 g_3
    p_0 in SOS(n, 2d)
    p_1, p_2, p_3 in SOS(n, 2d - 2)

The dual is:

max_mu <mu, f>:
    <e, mu> = 1
    mu in SOS*(n, 2d)
    g_1 mu, g_2 mu, g_3 mu in SOS*(n, 2d - 2)

I added the dual in the same gist. Again, Hypatia fails to converge when the interpolation points for the interpolation basis are from [-1, 1]^n. But if the interpolation points are picked from [1.5, 2] x [2, 2.5] (a rectangle enclosed in the triangle), the dual converges in 12 iterations while the primal takes 92.

I deliberately avoided merging the SOS* cones above into a single cone, which would bring us back to what Hypatia's dual WSOS cone does. I.e. Hypatia's dual WSOS is:

SOS*(n, 2d) \cap g_1^{-1}(SOS*(n, 2d - 2)) \cap g_2^{-1}(SOS*(n, 2d - 2)) \cap g_3^{-1}(SOS*(n, 2d - 2))

For completeness, a fourth approach (that requires more work) is to fix this issue. It would require users to know some points from their domain of nonnegativity in order to get an initial interior point oracle, but the basis interpolation points could be from any region. I briefly revisited it on a branch but similar to above attempts, the number of iterations is very dependent on how the interpolation points are picked.

with unstable monotonicity on top

It's normal for the objective to oscillate, this is not a property of Hypatia or the WSOS cone.

May I also ask from curiosity where the polynomial problem you are solving is from?

projekter commented 2 years ago

Thank you for your detailed analysis - so I'll give the dual formulation a try, this looks promising. The problem that I used is from doi:10.1137/S1052623400366802 (Example 5, page 811). I started reading on polynomial optimization and then of course wanted to get a feeling for how all these different algorithms actually are implemented and how they perform. And to check the implementation, I applied it to known test problems.

But actually, that's not really what I am interested in. What I want to do is to apply polynomial optimization for quantum problems. In particular, I will have some semidefinite constraints (states and maps), linear constraints (trace) and also nonconvex quadratic constraints (map applied on state). The number of variables will be high, but, exploiting sparsity, I hope that relaxations will not blow up to an unmanagable size. Still, it would probably be good if I didn't have to rewrite this as an SDP. So basically, it is not hard to come up with a feasible initial point, a random guess of, e.g., the state followed by a convex optimization on the map will give a point. I could probably even come up with a good initial point, based on heuristic optimizations (which I did so far), though whether this is worth the resources must be benchmarked. It is also not hard to define box constraints - e.g., due to the trace condition, every entry in the state matrix (let it be real for simplicity) is in [-1, 1]. So I can use the good box sampler to generate the grid and then the (dual) SOSPSD matrix formulation for the qmodule or moment matrices. But of course, I cannot use the weighted approach since not every point in the grid will give rise to a PSD state.

I will try applying this to "small" actual problem instances and also compare with your branch that allows to specify initial values. Thanks for the help!

lkapelevich commented 2 years ago

Sounds cool. Just in case you haven't seen it, you can always use https://github.com/jump-dev/SumOfSquares.jl if you want to build polynomial models quickly (but with an SDP reformulation).

jump-dev / Hypatia.jl

Numerical issues with Positivstellensatz and interpolant-based polynomial optimization #798