Open amontoison opened 3 months ago
I think this is expected. We don't try to compress the non-zero terms, even in trivial cases. Is there any evidence that this is a performance concern?
As a trivial example, a problem with one variable can have > 1 element in the Hessian:
julia> using JuMP, Ipopt
julia> model = Model(Ipopt.Optimizer)
A JuMP Model
Feasibility problem with:
Variables: 0
Model mode: AUTOMATIC
CachingOptimizer state: EMPTY_OPTIMIZER
Solver name: Ipopt
julia> @variable(model, x)
x
julia> @objective(model, Min, sin(x))
sin(x)
julia> @constraint(model, sin(x) >= 0)
sin(x) - 0.0 ≥ 0
julia> optimize!(model)
This is Ipopt version 3.14.14, running with linear solver MUMPS 5.6.2.
Number of nonzeros in equality constraint Jacobian...: 0
Number of nonzeros in inequality constraint Jacobian.: 1
Number of nonzeros in Lagrangian Hessian.............: 2
The issue is not that you don't compress the non-zero terms; it is that you detect non-zero terms where there are zeros in the Hessian.
If these additional nnz and on the diagonal, it doesn't impact the coloring and thus the performance.
You just allocate more storage than needed (+n
in the worst case).
If the detected non-zeros are not on the diagonal, you might require more colors (and thus more directional derivatives) to retrieve all non-zeros of the Hessian of the Lagrangian.
Do you have an example?
Is the claim something like this:
julia> import MathOptInterface as MOI
julia> x = MOI.VariableIndex.(1:2)
2-element Vector{MathOptInterface.VariableIndex}:
MOI.VariableIndex(1)
MOI.VariableIndex(2)
julia> model = MOI.Nonlinear.Model();
julia> objective = :(sin($(x[1])) + $(x[2])^1)
:(sin(MOI.VariableIndex(1)) + MOI.VariableIndex(2) ^ 1)
julia> MOI.Nonlinear.set_objective(model, objective)
julia> evaluator = MOI.Nonlinear.Evaluator(model, MOI.Nonlinear.SparseReverseMode(), x);
julia> MOI.initialize(evaluator, [:Hess])
julia> MOI.hessian_lagrangian_structure(evaluator)
2-element Vector{Tuple{Int64, Int64}}:
(1, 1)
(2, 2)
Where (2, 2)
in the Hessian is always 0.0
?
Are there examples there this matters?
using NLPModels, NLPModelsJuMP, JuMP
nlp = Model()
x0 = [5000, 5000, 5000, 200, 350, 150, 225, 425]
lvar = [100, 1000, 1000, 10, 10, 10, 10, 10]
uvar = [10000, 10000, 10000, 1000, 1000, 1000, 1000, 1000]
@variable(nlp, lvar[i] ≤ x[i = 1:8] ≤ uvar[i], start = x0[i])
@constraint(nlp, 1 - 0.0025 * (x[4] + x[6]) ≥ 0)
@constraint(nlp, 1 - 0.0025 * (x[5] + x[7] - x[4]) ≥ 0)
@constraint(nlp, 1 - 0.01 * (x[8] - x[5]) ≥ 0)
@NLconstraint(nlp, x[1] * x[6] - 833.33252 * x[4] - 100 * x[1] + 83333.333 ≥ 0)
@NLconstraint(nlp, x[2] * x[7] - 1250 * x[5] - x[2] * x[4] + 1250 * x[4] ≥ 0)
@NLconstraint(nlp, x[3] * x[8] - 1250000 - x[3] * x[5] + 2500 * x[5] ≥ 0)
@objective(nlp, Min, x[1] + x[2] + x[3])
nlp2 = MathOptNLPModel(nlp)
rows, cols = hess_structure(nlp2)
nnz_nlp = length(rows)
vals = ones(Int, nnz_nlp)
H = sparse(row, cols, vals)
julia> sparse(row, cols, vals)
8×8 SparseMatrixCSC{Int64, Int64} with 13 stored entries:
1 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
⋅ 1 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
⋅ ⋅ 1 ⋅ ⋅ ⋅ ⋅ ⋅
⋅ 1 ⋅ 1 ⋅ ⋅ ⋅ ⋅
⋅ ⋅ 1 ⋅ 1 ⋅ ⋅ ⋅
1 ⋅ ⋅ ⋅ ⋅ 1 ⋅ ⋅
⋅ 1 ⋅ ⋅ ⋅ ⋅ 1 ⋅
⋅ ⋅ 1 ⋅ ⋅ ⋅ ⋅ 1
It seems that the new nonlinear API fixed something :thinking:
using NLPModels, NLPModelsJuMP, JuMP
nlp = Model()
x0 = [5000, 5000, 5000, 200, 350, 150, 225, 425]
lvar = [100, 1000, 1000, 10, 10, 10, 10, 10]
uvar = [10000, 10000, 10000, 1000, 1000, 1000, 1000, 1000]
@variable(nlp, lvar[i] ≤ x[i = 1:8] ≤ uvar[i], start = x0[i])
@constraint(nlp, 1 - 0.0025 * (x[4] + x[6]) ≥ 0)
@constraint(nlp, 1 - 0.0025 * (x[5] + x[7] - x[4]) ≥ 0)
@constraint(nlp, 1 - 0.01 * (x[8] - x[5]) ≥ 0)
@constraint(nlp, x[1] * x[6] - 833.33252 * x[4] - 100 * x[1] + 83333.333 ≥ 0)
@constraint(nlp, x[2] * x[7] - 1250 * x[5] - x[2] * x[4] + 1250 * x[4] ≥ 0)
@constraint(nlp, x[3] * x[8] - 1250000 - x[3] * x[5] + 2500 * x[5] ≥ 0)
@objective(nlp, Min, x[1] + x[2] + x[3])
nlp2 = MathOptNLPModel(nlp)
rows, cols = hess_structure(nlp2)
nnz_nlp = length(rows)
vals = ones(Int, nnz_nlp)
sparse(rows, cols, vals, 8, 8)
8×8 SparseMatrixCSC{Int64, Int64} with 5 stored entries:
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
⋅ 1 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
⋅ ⋅ 1 ⋅ ⋅ ⋅ ⋅ ⋅
1 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
⋅ 1 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
⋅ ⋅ 1 ⋅ ⋅ ⋅ ⋅ ⋅
Benoît helped me to support it today and I never tested it before tonight.
Update: I don't use the non linear API in that case, the constraints are quadratic.
Your @NLconstraints
are actually quadratic, so you don't need to do AD on them.
Here's the underlying issue:
julia> x = MOI.VariableIndex.(1:2)
2-element Vector{MathOptInterface.VariableIndex}:
MOI.VariableIndex(1)
MOI.VariableIndex(2)
julia> model = MOI.Nonlinear.Model();
julia> MOI.Nonlinear.set_objective(model, :($(x[1]) * $(x[2])))
julia> evaluator = MOI.Nonlinear.Evaluator(model, MOI.Nonlinear.SparseReverseMode(), x);
julia> MOI.initialize(evaluator, [:Hess])
julia> MOI.hessian_lagrangian_structure(evaluator)
3-element Vector{Tuple{Int64, Int64}}:
(1, 1)
(2, 2)
(2, 1)
So I assume somewhere we are detecting Hessian sparsity and if op(x, y)
then we are adding (x, x), (x, y), and (y, y)
. We should special-case *
.
Nice catch Oscar :+1: I will try to isolate an example tomorrow with additional non-zeros that are not on the diagonal.
@odow I found a simple example with a coefficient off-diagonal:
using NLPModels, NLPModelsJuMP, JuMP
nlp = Model()
x0 = [1745, 12000, 110, 3048, 1974, 89.2, 92.8, 8, 3.6, 145]
lvar = [0.00001, 0.00001, 0.00001, 0.00001, 0.00001, 85, 90, 3, 1.2, 145]
uvar = [2000, 16000, 120, 5000, 2000, 93, 95, 12, 4, 162]
@variable(nlp, lvar[i] ≤ x[i = 1:10] ≤ uvar[i], start = x0[i])
a = 0.99
b = 0.9
@expression(nlp, g1, 35.82 - 0.222 * x[10] - b * x[9])
@expression(nlp, g2, -133 + 3 * x[7] - a * x[10])
@expression(nlp, g5, 1.12 * x[1] + 0.13167 * x[1] * x[8] - 0.00667 * x[1] * x[8]^2 - a * x[4])
@expression(nlp, g6, 57.425 + 1.098 * x[8] - 0.038 * x[8]^2 + 0.325 * x[6] - a * x[7])
@constraint(nlp, g1 ≥ 0)
@constraint(nlp, g2 ≥ 0)
@constraint(nlp, -g1 + x[9] * (1 / b - b) ≥ 0)
@constraint(nlp, -g2 + (1 / a - a) * x[10] ≥ 0)
@constraint(nlp, g5 ≥ 0)
@constraint(nlp, g6 ≥ 0)
@constraint(nlp, -g5 + (1 / a - a) * x[4] ≥ 0)
@constraint(nlp, -g6 + (1 / a - a) * x[7] ≥ 0)
@constraint(nlp, 1.22 * x[4] - x[1] - x[5] == 0)
@constraint(nlp, 98000 * x[3] / (x[4] * x[9] + 1000 * x[3]) - x[6] == 0)
@constraint(nlp, (x[2] + x[5]) / x[1] - x[8] == 0)
@objective(nlp, Min, 5.04 * x[1] + 0.035 * x[2] + 10 * x[3] + 3.36 * x[5] - 0.063 * x[4] * x[7])
nlp2 = MathOptNLPModel(nlp)
rows, cols = hess_structure(nlp2)
nnz_nlp = length(rows)
vals = ones(Int, nnz_nlp)
H = sparse(rows, cols, vals, 10, 10)
10×10 SparseMatrixCSC{Int64, Int64} with 16 stored entries:
3 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
1 1 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
⋅ ⋅ 1 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
⋅ ⋅ 1 2 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
1 1 ⋅ ⋅ 1 ⋅ ⋅ ⋅ ⋅ ⋅
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
⋅ ⋅ ⋅ 1 ⋅ ⋅ 1 ⋅ ⋅ ⋅
2 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 4 ⋅ ⋅
⋅ ⋅ 1 1 ⋅ ⋅ ⋅ ⋅ 1 ⋅
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
I don't understand why H[5,2]
is a non-zero.
I tried:
@constraint(nlp, x[2] / x[1] + x[5] / x[1] - x[8] == 0)
instead of
@constraint(nlp, (x[2] + x[5]) / x[1] - x[8] == 0)
and H[5,2]
is not considered anymore as a non-zero.
using NLPModels, NLPModelsJuMP, JuMP
nlp = Model()
@variable(nlp, x[i = 1:10])
@constraint(nlp, sum(x[i] for i = 1:10) / x[1] == 0)
@objective(nlp, Min, x[1]^4)
nlp2 = MathOptNLPModel(nlp)
rows, cols = hess_structure(nlp2)
nnz_nlp = length(rows)
vals = ones(Int, nnz_nlp)
H = sparse(rows, cols, vals, 10, 10)
The Hessian is completely dense :fearful:
10×10 SparseMatrixCSC{Int64, Int64} with 55 stored entries:
2 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
1 1 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
1 1 1 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
1 1 1 1 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
1 1 1 1 1 ⋅ ⋅ ⋅ ⋅ ⋅
1 1 1 1 1 1 ⋅ ⋅ ⋅ ⋅
1 1 1 1 1 1 1 ⋅ ⋅ ⋅
1 1 1 1 1 1 1 1 ⋅ ⋅
1 1 1 1 1 1 1 1 1 ⋅
1 1 1 1 1 1 1 1 1 1
I think this is even somewhat expected from our sparsity algorithm. We don't recognize that sum(x) / y
is decomposable. The sparsity algorithm trades simplicity for some false positives. I guess the sparseconnectivitytracer trades exactness for some cases where it is much slower than JuMP. I can see how this one might be problematic:
https://github.com/JuliaSmoothOptimizers/OptimizationProblems.jl/blob/458d8141bbe7ffc4552c713ec755241de7b4fa15/src/PureJuMP/tetra.jl#L37-L62
I'm still interested in seeing a model where our choice makes an important runtime difference (other than artificial examples like the one above).
Perhaps this issue is really asking for better insight on how different formulations might perform:
julia> using NLPModels, NLPModelsJuMP, JuMP, SparseArrays
julia> function nlp_hessian(model)
nlp = MathOptNLPModel(model)
rows, cols = hess_structure(nlp)
n = num_variables(model)
return SparseArrays.sparse(rows, cols, ones(Int, length(rows)), n, n)
end
nlp_hessian (generic function with 5 methods)
julia> begin
model = Model()
@variable(model, x[1:4])
@constraint(model, sum(x) / x[1] == 0)
nlp_hessian(model)
end
4×4 SparseMatrixCSC{Int64, Int64} with 10 stored entries:
1 ⋅ ⋅ ⋅
1 1 ⋅ ⋅
1 1 1 ⋅
1 1 1 1
julia> begin
model = Model()
@variable(model, x[1:4])
@constraint(model, sum(x[i] / x[1] for i in 1:4) == 0)
nlp_hessian(model)
end
4×4 SparseMatrixCSC{Int64, Int64} with 7 stored entries:
1 ⋅ ⋅ ⋅
1 1 ⋅ ⋅
1 ⋅ 1 ⋅
1 ⋅ ⋅ 1
julia> begin
model = Model()
@variable(model, x[1:4])
@variable(model, y)
@constraint(model, y == sum(x))
@constraint(model, y / x[1] == 0)
nlp_hessian(model)
end
5×5 SparseMatrixCSC{Int64, Int64} with 3 stored entries:
1 ⋅ ⋅ ⋅ ⋅
⋅ ⋅ ⋅ ⋅ ⋅
⋅ ⋅ ⋅ ⋅ ⋅
⋅ ⋅ ⋅ ⋅ ⋅
1 ⋅ ⋅ ⋅ 1
That's a really cool insight actually!
As discussed with @mlubin during JuMP-dev 2024, we recently observed with Guillaume Dalle (@gdalle) that JuMP detects additional non-zeros on the diagonal of the Hessian than what is needed.
We also observed other discrepancies:
I don't think that it's a bug, just a part of the code that was probably not optimized for corner cases.