[Nonlinear] sparsity pattern of Hessian with :(x * y)

amontoison commented 3 months ago

As discussed with @mlubin during JuMP-dev 2024, we recently observed with Guillaume Dalle (@gdalle) that JuMP detects additional non-zeros on the diagonal of the Hessian than what is needed.

We also observed other discrepancies:

I don't think that it's a bug, just a part of the code that was probably not optimized for corner cases.

odow commented 3 months ago

I think this is expected. We don't try to compress the non-zero terms, even in trivial cases. Is there any evidence that this is a performance concern?

odow commented 3 months ago

As a trivial example, a problem with one variable can have > 1 element in the Hessian:

julia> using JuMP, Ipopt

julia> model = Model(Ipopt.Optimizer)
A JuMP Model
Feasibility problem with:
Variables: 0
Model mode: AUTOMATIC
CachingOptimizer state: EMPTY_OPTIMIZER
Solver name: Ipopt

julia> @variable(model, x)
x

julia> @objective(model, Min, sin(x))
sin(x)

julia> @constraint(model, sin(x) >= 0)
sin(x) - 0.0 ≥ 0

julia> optimize!(model)
This is Ipopt version 3.14.14, running with linear solver MUMPS 5.6.2.

Number of nonzeros in equality constraint Jacobian...:        0
Number of nonzeros in inequality constraint Jacobian.:        1
Number of nonzeros in Lagrangian Hessian.............:        2

amontoison commented 3 months ago

The issue is not that you don't compress the non-zero terms; it is that you detect non-zero terms where there are zeros in the Hessian.

If these additional nnz and on the diagonal, it doesn't impact the coloring and thus the performance. You just allocate more storage than needed (+n in the worst case).

If the detected non-zeros are not on the diagonal, you might require more colors (and thus more directional derivatives) to retrieve all non-zeros of the Hessian of the Lagrangian.

odow commented 3 months ago

Do you have an example?

odow commented 3 months ago

Is the claim something like this:

julia> import MathOptInterface as MOI

julia> x = MOI.VariableIndex.(1:2)
2-element Vector{MathOptInterface.VariableIndex}:
 MOI.VariableIndex(1)
 MOI.VariableIndex(2)

julia> model = MOI.Nonlinear.Model();

julia> objective = :(sin($(x[1])) + $(x[2])^1)
:(sin(MOI.VariableIndex(1)) + MOI.VariableIndex(2) ^ 1)

julia> MOI.Nonlinear.set_objective(model, objective)

julia> evaluator = MOI.Nonlinear.Evaluator(model, MOI.Nonlinear.SparseReverseMode(), x);

julia> MOI.initialize(evaluator, [:Hess])

julia> MOI.hessian_lagrangian_structure(evaluator)
2-element Vector{Tuple{Int64, Int64}}:
 (1, 1)
 (2, 2)

Where (2, 2) in the Hessian is always 0.0?

Are there examples there this matters?

amontoison commented 3 months ago

using NLPModels, NLPModelsJuMP, JuMP

nlp = Model()
x0 = [5000, 5000, 5000, 200, 350, 150, 225, 425]
lvar = [100, 1000, 1000, 10, 10, 10, 10, 10]
uvar = [10000, 10000, 10000, 1000, 1000, 1000, 1000, 1000]
@variable(nlp, lvar[i] ≤ x[i = 1:8] ≤ uvar[i], start = x0[i])

@constraint(nlp, 1 - 0.0025 * (x[4] + x[6]) ≥ 0)
@constraint(nlp, 1 - 0.0025 * (x[5] + x[7] - x[4]) ≥ 0)
@constraint(nlp, 1 - 0.01 * (x[8] - x[5]) ≥ 0)
@NLconstraint(nlp, x[1] * x[6] - 833.33252 * x[4] - 100 * x[1] + 83333.333 ≥ 0)
@NLconstraint(nlp, x[2] * x[7] - 1250 * x[5] - x[2] * x[4] + 1250 * x[4] ≥ 0)
@NLconstraint(nlp, x[3] * x[8] - 1250000 - x[3] * x[5] + 2500 * x[5] ≥ 0)

@objective(nlp, Min, x[1] + x[2] + x[3])

nlp2 = MathOptNLPModel(nlp)
rows, cols = hess_structure(nlp2)
nnz_nlp = length(rows)
vals = ones(Int, nnz_nlp)
H = sparse(row, cols, vals)

julia> sparse(row, cols, vals)
8×8 SparseMatrixCSC{Int64, Int64} with 13 stored entries:
 1  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅
 ⋅  1  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅
 ⋅  ⋅  1  ⋅  ⋅  ⋅  ⋅  ⋅
 ⋅  1  ⋅  1  ⋅  ⋅  ⋅  ⋅
 ⋅  ⋅  1  ⋅  1  ⋅  ⋅  ⋅
 1  ⋅  ⋅  ⋅  ⋅  1  ⋅  ⋅
 ⋅  1  ⋅  ⋅  ⋅  ⋅  1  ⋅
 ⋅  ⋅  1  ⋅  ⋅  ⋅  ⋅  1

amontoison commented 3 months ago

It seems that the new nonlinear API fixed something :thinking:

using NLPModels, NLPModelsJuMP, JuMP

nlp = Model()
x0 = [5000, 5000, 5000, 200, 350, 150, 225, 425]
lvar = [100, 1000, 1000, 10, 10, 10, 10, 10]
uvar = [10000, 10000, 10000, 1000, 1000, 1000, 1000, 1000]
@variable(nlp, lvar[i] ≤ x[i = 1:8] ≤ uvar[i], start = x0[i])

@constraint(nlp, 1 - 0.0025 * (x[4] + x[6]) ≥ 0)
@constraint(nlp, 1 - 0.0025 * (x[5] + x[7] - x[4]) ≥ 0)
@constraint(nlp, 1 - 0.01 * (x[8] - x[5]) ≥ 0)
@constraint(nlp, x[1] * x[6] - 833.33252 * x[4] - 100 * x[1] + 83333.333 ≥ 0)
@constraint(nlp, x[2] * x[7] - 1250 * x[5] - x[2] * x[4] + 1250 * x[4] ≥ 0)
@constraint(nlp, x[3] * x[8] - 1250000 - x[3] * x[5] + 2500 * x[5] ≥ 0)

@objective(nlp, Min, x[1] + x[2] + x[3])

nlp2 = MathOptNLPModel(nlp)
rows, cols = hess_structure(nlp2)
nnz_nlp = length(rows)
vals = ones(Int, nnz_nlp)
sparse(rows, cols, vals, 8, 8)

8×8 SparseMatrixCSC{Int64, Int64} with 5 stored entries:
 ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅
 ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅
 ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅
 ⋅  1  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅
 ⋅  ⋅  1  ⋅  ⋅  ⋅  ⋅  ⋅
 1  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅
 ⋅  1  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅
 ⋅  ⋅  1  ⋅  ⋅  ⋅  ⋅  ⋅

Benoît helped me to support it today and I never tested it before tonight.

Update: I don't use the non linear API in that case, the constraints are quadratic.

odow commented 3 months ago

Your @NLconstraints are actually quadratic, so you don't need to do AD on them.

Here's the underlying issue:

julia> x = MOI.VariableIndex.(1:2)
2-element Vector{MathOptInterface.VariableIndex}:
 MOI.VariableIndex(1)
 MOI.VariableIndex(2)

julia> model = MOI.Nonlinear.Model();

julia> MOI.Nonlinear.set_objective(model, :($(x[1]) * $(x[2])))

julia> evaluator = MOI.Nonlinear.Evaluator(model, MOI.Nonlinear.SparseReverseMode(), x);

julia> MOI.initialize(evaluator, [:Hess])

julia> MOI.hessian_lagrangian_structure(evaluator)
3-element Vector{Tuple{Int64, Int64}}:
 (1, 1)
 (2, 2)
 (2, 1)

odow commented 3 months ago

So I assume somewhere we are detecting Hessian sparsity and if op(x, y) then we are adding (x, x), (x, y), and (y, y). We should special-case *.

amontoison commented 3 months ago

Nice catch Oscar :+1: I will try to isolate an example tomorrow with additional non-zeros that are not on the diagonal.

amontoison commented 3 months ago

@odow I found a simple example with a coefficient off-diagonal:

using NLPModels, NLPModelsJuMP, JuMP

nlp = Model()

x0 = [1745, 12000, 110, 3048, 1974, 89.2, 92.8, 8, 3.6, 145]
lvar = [0.00001, 0.00001, 0.00001, 0.00001, 0.00001, 85, 90, 3, 1.2, 145]
uvar = [2000, 16000, 120, 5000, 2000, 93, 95, 12, 4, 162]
@variable(nlp, lvar[i] ≤ x[i = 1:10] ≤ uvar[i], start = x0[i])

a = 0.99
b = 0.9

@expression(nlp, g1, 35.82 - 0.222 * x[10] - b * x[9])
@expression(nlp, g2, -133 + 3 * x[7] - a * x[10])
@expression(nlp, g5, 1.12 * x[1] + 0.13167 * x[1] * x[8] - 0.00667 * x[1] * x[8]^2 - a * x[4])
@expression(nlp, g6, 57.425 + 1.098 * x[8] - 0.038 * x[8]^2 + 0.325 * x[6] - a * x[7])

@constraint(nlp, g1 ≥ 0)
@constraint(nlp, g2 ≥ 0)
@constraint(nlp, -g1 + x[9] * (1 / b - b) ≥ 0)
@constraint(nlp, -g2 + (1 / a - a) * x[10] ≥ 0)
@constraint(nlp, g5 ≥ 0)
@constraint(nlp, g6 ≥ 0)
@constraint(nlp, -g5 + (1 / a - a) * x[4] ≥ 0)
@constraint(nlp, -g6 + (1 / a - a) * x[7] ≥ 0)
@constraint(nlp, 1.22 * x[4] - x[1] - x[5] == 0)
@constraint(nlp, 98000 * x[3] / (x[4] * x[9] + 1000 * x[3]) - x[6] == 0)
@constraint(nlp, (x[2] + x[5]) / x[1] - x[8] == 0)

@objective(nlp, Min, 5.04 * x[1] + 0.035 * x[2] + 10 * x[3] + 3.36 * x[5] - 0.063 * x[4] * x[7])

nlp2 = MathOptNLPModel(nlp)
rows, cols = hess_structure(nlp2)
nnz_nlp = length(rows)
vals = ones(Int, nnz_nlp)
H = sparse(rows, cols, vals, 10, 10)

10×10 SparseMatrixCSC{Int64, Int64} with 16 stored entries:
 3  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅
 1  1  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅
 ⋅  ⋅  1  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅
 ⋅  ⋅  1  2  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅
 1  1  ⋅  ⋅  1  ⋅  ⋅  ⋅  ⋅  ⋅
 ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅
 ⋅  ⋅  ⋅  1  ⋅  ⋅  1  ⋅  ⋅  ⋅
 2  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  4  ⋅  ⋅
 ⋅  ⋅  1  1  ⋅  ⋅  ⋅  ⋅  1  ⋅
 ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅

I don't understand why H[5,2] is a non-zero.

I tried:

@constraint(nlp, x[2] / x[1] + x[5] / x[1] - x[8] == 0)

instead of

@constraint(nlp, (x[2] + x[5]) / x[1] - x[8] == 0)

and H[5,2] is not considered anymore as a non-zero.

amontoison commented 3 months ago

using NLPModels, NLPModelsJuMP, JuMP

nlp = Model()

@variable(nlp,  x[i = 1:10])
@constraint(nlp, sum(x[i] for i = 1:10) / x[1]  == 0)
@objective(nlp, Min, x[1]^4)

nlp2 = MathOptNLPModel(nlp)
rows, cols = hess_structure(nlp2)
nnz_nlp = length(rows)
vals = ones(Int, nnz_nlp)
H = sparse(rows, cols, vals, 10, 10)

The Hessian is completely dense :fearful:

10×10 SparseMatrixCSC{Int64, Int64} with 55 stored entries:
 2  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅
 1  1  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅
 1  1  1  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅
 1  1  1  1  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅
 1  1  1  1  1  ⋅  ⋅  ⋅  ⋅  ⋅
 1  1  1  1  1  1  ⋅  ⋅  ⋅  ⋅
 1  1  1  1  1  1  1  ⋅  ⋅  ⋅
 1  1  1  1  1  1  1  1  ⋅  ⋅
 1  1  1  1  1  1  1  1  1  ⋅
 1  1  1  1  1  1  1  1  1  1

odow commented 3 months ago

I think this is even somewhat expected from our sparsity algorithm. We don't recognize that sum(x) / y is decomposable. The sparsity algorithm trades simplicity for some false positives. I guess the sparseconnectivitytracer trades exactness for some cases where it is much slower than JuMP. I can see how this one might be problematic: https://github.com/JuliaSmoothOptimizers/OptimizationProblems.jl/blob/458d8141bbe7ffc4552c713ec755241de7b4fa15/src/PureJuMP/tetra.jl#L37-L62

I'm still interested in seeing a model where our choice makes an important runtime difference (other than artificial examples like the one above).

odow commented 1 month ago

Perhaps this issue is really asking for better insight on how different formulations might perform:

julia> using NLPModels, NLPModelsJuMP, JuMP, SparseArrays

julia> function nlp_hessian(model)
           nlp = MathOptNLPModel(model)
           rows, cols = hess_structure(nlp)
           n = num_variables(model)
           return SparseArrays.sparse(rows, cols, ones(Int, length(rows)), n, n)
       end
nlp_hessian (generic function with 5 methods)

julia> begin
           model = Model()
           @variable(model,  x[1:4])
           @constraint(model, sum(x) / x[1]  == 0)
           nlp_hessian(model)
       end
4×4 SparseMatrixCSC{Int64, Int64} with 10 stored entries:
 1  ⋅  ⋅  ⋅
 1  1  ⋅  ⋅
 1  1  1  ⋅
 1  1  1  1

julia> begin
           model = Model()
           @variable(model,  x[1:4])
           @constraint(model, sum(x[i] / x[1] for i in 1:4) == 0)
           nlp_hessian(model)
       end
4×4 SparseMatrixCSC{Int64, Int64} with 7 stored entries:
 1  ⋅  ⋅  ⋅
 1  1  ⋅  ⋅
 1  ⋅  1  ⋅
 1  ⋅  ⋅  1

julia> begin
           model = Model()
           @variable(model,  x[1:4])
           @variable(model, y)
           @constraint(model, y == sum(x))
           @constraint(model, y / x[1] == 0)
           nlp_hessian(model)
       end
5×5 SparseMatrixCSC{Int64, Int64} with 3 stored entries:
 1  ⋅  ⋅  ⋅  ⋅
 ⋅  ⋅  ⋅  ⋅  ⋅
 ⋅  ⋅  ⋅  ⋅  ⋅
 ⋅  ⋅  ⋅  ⋅  ⋅
 1  ⋅  ⋅  ⋅  1

gdalle commented 1 month ago

That's a really cool insight actually!

jump-dev / MathOptInterface.jl

[Nonlinear] sparsity pattern of Hessian with :(x * y) #2527