Dual infeasibility returned in feasible (and bounded) SDP

oxfordcontrol / COSMO.jl

COSMO: Accelerated ADMM-based solver for convex conic optimisation problems (LP, QP, SOCP, SDP, ExpCP, PowCP). Automatic chordal decomposition of sparse semidefinite programs.

https://oxfordcontrol.github.io/COSMO.jl/stable

Apache License 2.0

283 stars 41 forks source link

Dual infeasibility returned in feasible (and bounded) SDP #113

Closed angeris closed 4 years ago

angeris commented 4 years ago

I am also having a similar issue as given in #90, but on a general SDP problem. Here is the current one (I'm trying to get a slightly better MWE, but it's a weird one. Setting n=10 in this example succeeds and finds a nearly-feasible point, but larger things like n=100 fail and return a dual infeasibility status.)

using LinearAlgebra, JuMP, COSMO, SparseArrays

n = 100

m = Model(COSMO.Optimizer)

@variable(m, d[1:n])
@variable(m, λ[1:n])

N = Diagonal(d)

T = [
    -2I + 4*N - Diagonal(λ)  2*N
    2*N                      N + Diagonal(λ)
]

dropzeros!(T)

v = [
    2 * N * ones(n)
    N * ones(n)
]

@variable(m, t)
@objective(m, Max, (ones(n)' * N * ones(n) - t)/2)
Q = [
    t   v'
    v   T
]
@constraint(m, Q ∈ PSDCone())

optimize!(m)

@show primal_status(m)
@info "General lower bounds : $(-objective_value(m))"
@info "Singular value lower bounds : $(n)"

I am trying to find a slightly simpler MWE, but small changes will break it. On the other hand, the objective value returned is not terrible. (In fact, the exact value of the objective value should be given by norm(ones(n))^2, which is just n.)

migarstka commented 4 years ago

I can take a closer look tomorrow.

What happens if you set decompose = false ? This turns off any decomposition attempts.
What happens if you set check_infeasibility = 100000 and max_iter = 10000? This disables infeasibility checks. Does it converge?
What happens if you change the value of rho (the step size of the algorithm) or set adaptive_rho = false? COSMO Settings

angeris commented 4 years ago

What happens if you set decompose = false ? This turns off any decomposition attempts.

Weirdly, in this case, a feasible (but very far from optimal) point is returned.

>>> Results
Status: Solved
Iterations: 480
Optimal objective: 3023.3675
Runtime: 5.177s (5176.9ms)

primal_status(m) = MathOptInterface.FEASIBLE_POINT
[ Info: General lower bounds : 3023.3675304153003
[ Info: Singular value lower bounds : 100.0

What happens if you set check_infeasibility = 100000 and max_iter = 10000? This disables infeasibility checks. Does it converge?

I increased the number of iterations, and removed the feasibility check and it does ok:

>>> Results
Status: Solved
Iterations: 2800
Optimal objective: 103.9837
Runtime: 2.092s (2092.14ms)

primal_status(m) = MathOptInterface.FEASIBLE_POINT
[ Info: General lower bounds : 103.9836991329853
[ Info: Singular value lower bounds : 100.0

(I have not changed the default tolerance here, so this is close enough to optimal.)

What happens if you change the value of rho (the step size of the algorithm) or set adaptive_rho = false?

As one would expect, roughly the same result:

>>> Results
Status: Dual_infeasible
Iterations: 1880
Optimal objective: 110.8402
Runtime: 1.423s (1422.51ms)

primal_status(m) = MathOptInterface.INFEASIBILITY_CERTIFICATE
[ Info: General lower bounds : 110.84022028945981
[ Info: Singular value lower bounds : 100.0

My guess is that somehow, the scaling isn't quite right (or the tolerances are a bit too small) on the dual check. I haven't carefully gone through the code, so I'm not sure.

Thank you yet again, by the way, for responding to all of my current issues! (I would absolutely love to use/cite COSMO for this current paper I'm working on, especially since it seems to be, pretty much by far, the fastest of the available solvers I've tried.)

migarstka commented 4 years ago

This is actually a really nice example problem that should showcase COSMO's strengths. Could you provide a bit more context on where these type of problems arise for you? (It would make a nice example for our documentation).

Your PSD constraint in JuMP produces several constraints:

Make Q positive semidefinite
Put zeros inQ where T has zeros

COSMO recognizes this kind of structural constraint on a PSD constraint and tries to split up the PSD constraint on Q into many PSD constraints on non-zero subblocks of Q. The reason for that is that the solver algorithm scales roughly O(n^3) where Q in S^n_+. This is not great if n is large. So instead you'd preferably solve an equivalent problem with p PSD constraints of much smaller size.

If you set the following settings:

strategy = with_options(COSMO.NoMerge)
m = JuMP.Model(with_optimizer(COSMO.Optimizer, verbose = true, merge_strategy = strategy, eps_dual_inf = 1e-6, max_iter = 10000 ))

The solver output actually tells you that it found a sparsity pattern and decomposed the constraint into 100 PSD constraints for you**:

------------------------------------------------------------------
          COSMO v0.7.3 - A Quadratic Objective Conic Solver
                         Michael Garstka
                University of Oxford, 2017 - 2020
------------------------------------------------------------------

Problem:  x ∈ R^{300},
          constraints: A ∈ R^{600x300} (899 nnz),
          matrix size to factor: 900x900,
          Floating-point precision: Float64
Sets:     PsdConeTriangle of dim: 6
          PsdConeTriangle of dim: 6
          PsdConeTriangle of dim: 6
          PsdConeTriangle of dim: 6
          PsdConeTriangle of dim: 6
          PsdConeTriangle of dim: 6
          ... and 95 more
Decomp:   Num of original PSD cones: 1
          Num decomposable PSD cones: 1
          Num PSD cones after decomposition: 100
          Merge Strategy: NoMerge
Settings: ϵ_abs = 1.0e-04, ϵ_rel = 1.0e-04,
          ϵ_prim_inf = 1.0e-06, ϵ_dual_inf = 1.0e-06,
          ρ = 0.1, σ = 1e-06, α = 1.6,
          max_iter = 10000,
          scaling iter = 10 (on),
          check termination every 40 iter,
          check infeasibility every 40 iter,
          KKT system solver: QDLDL
Setup Time: 2.6ms

The number of operations of the solver at each iteration reduces from 100^3 to 100 * 6^3 which is much less. This is just the initial decomposition, in a second step we are trying to merge some of these blocks back together if they overlap heavily. You can change this merging strategy. For now, I would suggest strategy = with_options(COSMO.ParentChildMerge) or strategy = with_options(COSMO.NoMerge) as I am currently in the process of removing a bug in the COSMO.CliqueGraphMerge strategy. Take a look at the section on Chordal Decomposition if you are interested in how this works.

Notice also that I set eps_dual_inf = 1e-6 to decrease the tolerance for dual infeasibility checks. As you mentioned, 1e-4 is likely to be too high if your N is large.
If you want to change the solver accuracy, you can use eps_abs and eps_rel.

**I just noticed that the counter in the verbose output should say ...and 94 more. I will fix this 2dcf4a8e0326ad8681c08a2187e716c40939faee.

angeris commented 4 years ago

This is actually a really nice example problem that should showcase COSMO's strengths.

Oh, absolutely! In fact, this is why I chose COSMO: I was thinking of implementing some simple ADMM-type problem with the chordal decomposition schemes for the PSD constraint + its projection (I saw Vandenberghe's reference on this, originally) because the SDPs I was solving were way too large for the usual methods that involve projecting the whole matrix variable into the cone.

Could you provide a bit more context on where these type of problems arise for you? (It would make a nice example for our documentation).

These examples come from constructing lower bounds for physics problems. (See, e.g. this older paper of mine which reduces to a convex QCQP, but more general bounds are given by SDPs.) In this case, the matrices all have really nice (and compact) chordal decompositions, since many of the sub-blocks are usually discretizations of (local) operators (such as the Laplacian, etc). They usually have some sort of overlapping diagonal block substructure + some other (small number of) terms on the off-diagonals.

I'd be happy to give some more concrete examples! (I currently don't really have any since my current approach is not really published anywhere or even available in preprint form—it's rather preliminary work.) On the other hand, some types of variational design problems (e.g., finding a design which minimizes the maximum eigenvalue of a particular physical equation) can be recast as SDPs and has perfect chordal structure. I have some truly horrible class project write-up from a couple of years ago that was somewhat rushed, but discusses some of these problems in the context of quantum mechanics.

The number of operations of the solver at each iteration reduces from 100^3 to 100 * 6^3 which is much less. This is just the initial decomposition, in a second step we are trying to merge some of these blocks back together if they overlap heavily. You can change this merging strategy. For now, I would suggest strategy = with_options(COSMO.ParentChildMerge) or strategy = with_options(COSMO.NoMerge) as I am currently in the process of removing a bug in the COSMO.CliqueGraphMerge strategy. Take a look at the section on Chordal Decomposition if you are interested in how this works.

Yes, exactly :) . I'll try out the ParentChildMerge; right now (at least, in this example) the merge strategy shouldn't really be finding any merges anyways.

Notice also that I set eps_dual_inf = 1e-6 to decrease the tolerance for dual infeasibility checks. As you mentioned, 1e-4 is likely to be too high if your N is large. If you want to change the solver accuracy, you can use eps_abs and eps_rel.

Indeed, I've already changed both things in the program!

Anyways, thank you for the help! It would be wonderful to use COSMO for these types of problems, since it really does exploit a lot of the structure available for these kind of physical problems.

goulart-paul commented 4 years ago

My understanding of the problem description is that, for any optimising pair (d,λ) and any permutation matrix P, the pair (Pd,Pλ) will also be an optimiser. Seemingly this implies that there will always exist an optimiser with all elements of d equal to each other, and likewise for the elements λ. I also checked this numerically and it appears right.

Taking therefore the case with n = 1 and making a guess based on numerical trial, it seems that the optimal solution is λ = 2, d = k and t = λ + d, taking a limit as k \to \infty. The smallest eigenvalue of Q appears to approach 0 from below, so that the SDP constraint is satisfied only in the limit.

Assuming that the above is right (please do check since I was not that thorough), the fact that the optimiser is not obtainable is the root cause of the problem.

angeris commented 4 years ago

My understanding of the problem description is that, for any optimising pair (d,λ) and any permutation matrix P, the pair (Pd,Pλ) will also be an optimiser. Seemingly this implies that there will always exist an optimiser with all elements of d equal to each other, and likewise for the elements λ. I also checked this numerically and it appears right.

Yes this is indeed true for this problem instance :)

Taking therefore the case with n = 1 and making a guess based on numerical trial, it seems that the optimal solution is λ = 2, d = k and t = λ + d, taking a limit as k \to \infty. The smallest eigenvalue of Q appears to approach 0 from below, so that the SDP constraint is satisfied only in the limit.

Assuming that the above is right (please do check since I was not that thorough), the fact that the optimiser is not obtainable is the root cause of the problem.

Hmm, I'm afraid I'm not sure about this. Slater's condition should hold (since there exists a feasible N and λ, and therefore t, such that Q > 0; i.e., that Q lies in the interior of the cone), which means that there should exist a minimizing primal-dual pair since the optimal value is finite. (I could be wrong, though, but taking, say, λ = 1 and N = d = 2, which makes T PD and making t large seems to suffice.)

angeris commented 4 years ago

In fact, picking t to be exactly equal to t = v' pinv(T) v (which is technically the original problem statement) should achieve the bound that the smallest eigenvalue of Q should be 0. (A proof follows by, say, the Schur complement of Q over T, for any Q ≥ 0. The equality will always hold by making the t entry as small as possible; i.e., by satisfying t = v' pinv(T) v. So I think this is why the limit given isn't optimal.)

goulart-paul commented 4 years ago

The slater condition tells you something about the optimal values of the primal and dual problems, it doesn't guarantee the existence of a primal optimiser. The problem min_{x\ge 0} 1/x is convex, strictly feasible and has the finite optimal value 0, but has no optimiser.

That same problem can be turned into an SDP to illustrate what I think is the issue you have. Write the same problem as

\min t
st:
1/x \le t 
x \ge 0

and then rewrite the inequality to get

\min t
st: [t 1
     1 x] \succeq 0
x \ge 0

A strictly feasible point is (t,x) = (2,2). If you set (t,x) = (z,1/z) for any positive z, you find that the matrix

Q = [z 1; 1 1/z]

is positive semidefinite (its eigenvalues are 0 and z + 1/z). You can produce a sequence of primal variables that converges to the optimal value by taking the limit as t = z \to 0, but the sequence doesn't produce an optimiser since it forces x = 1/z \to \infty.

angeris commented 4 years ago

Apologies! This is indeed very true. (I misremembered the proof of strong duality, which guarantees that a dual optimal value is achievable under Slater's condition for the primal problem, if the optimal value of the primal is finite, but the primal optimal value need not be achievable. See, e.g., B&V's proof.) Of course, it would suffice to prove that both the primal and dual are strictly feasible to have an achievable value for both.

Either way, I'm still not fully convinced (though I am, of course, not certain) that the original problem has an optimal value that is not achievable. The main reason is that the problem I gave in the OP is the dual of a nonconvex QCQP, so its dual problem (i.e., the dual of the problem in the OP) should be an SDP relaxation of the original nonconvex QCQP. If this is the case, then the SDP relaxation of a nonconvex QCQP is always strictly feasible, which would imply that the dual value (i.e., the value of the problem in the OP) should be achievable. (So, in this case, there should be an optimal, achievable primal-dual pair.)

Thank you both (@migarstka and @goulart-paul), by the way, for taking so much time! :)

goulart-paul commented 4 years ago

If I set n = 1 the optimal value of the problem appears to be 1. Set λ = 2 and t = λ + d, so that (t-d)/2 = 1. For any positive d, the matrix Q will have a single negative eigenvalue, but in the limit as d\to \infty this eigenvalue appears to converge to zero, so that Q is PSD in the limit. Here's what I see if I do that numerically:

I seemingly have a sequence of primal variables that converge to the feasible set and achieve the optimal value only in the limit.

angeris commented 4 years ago

I seemingly have a sequence of primal variables that converge to the feasible set and achieve the optimal value only in the limit.

Sure! Additionally, setting t = λ + d + ε gives a feasible point (with Q ≥ 0) for d large enough (with how large it needs to be depending on ε > 0), so a sequence of feasible iterates exists which converges to the optimal point (by making ε small enough as d is increased), but as you showed, the actual optimal value will converge only in the limit.

I wonder why this is the case, though. It's generally surprising to me that this problem is so ill-behaved, but I guess that's just how it goes...

Additionally: how were you able to easily compute that t = λ + d at the minimal t for given λ, d? I agree that this is true, but it's certainly not immediately obvious to me!

Thanks!

(I will also close this issue, since it seems to be the case that this is not a problem with the solver, but rather with the problem!)

goulart-paul commented 4 years ago

I would guess that your dual problem is suspect somehow. Perhaps there are no strictly dual feasible points. Maybe a results like Corollary 5.3.10 in the text by Borwein is helpful to you (I didn't check it though).

The condition t = λ + d was just a guess based on solving the problem numerically with n = 1 and looking at what results I was getting at different convergence tolerances.

angeris commented 4 years ago

I would guess that your dual problem is suspect somehow. Perhaps there are no strictly dual feasible points. Maybe a results like Corollary 5.3.10 in the text by Borwein is helpful to you (I didn't check it though).

Yeah, I was essentially referencing this result originally (which is why I noted that the dual problem of the one above can achieve its optimal value) and imagined that the dual was also strictly feasible, but, alas, it seems that this latter statement is not true. :( I will try reformulating it in some other way to see if it's possible to avoid this problem!

Anyways, thank you yet again!