Closed Robbybp closed 1 month ago
Yip. This doesn't surprise me in the slightest. Try AMPL via AmplNLWriter? (And shouldn't you be going to the airport soon?)
I think the main problem is that we scalarize Ax + b
, so dense matrices generate a lot of terms in the tape.
And in this case, the Hessian is dense.
But it's a good example for me to profile and understand, so let's leave this open.
using JuMP
import MathOptAI as MOAI
function make_model(d = 100)
model = Model()
output_d = 10
@variable(model, x[1:d] >= 0)
@objective(model, Min, sum(x.^2))
f = MOAI.Pipeline(
MOAI.Affine(-ones(d, d), ones(d)),
MOAI.Sigmoid(),
MOAI.Affine(ones(d, d), ones(d)),
MOAI.Sigmoid(),
MOAI.Affine(-ones(output_d, d), ones(output_d)),
)
y = MOAI.add_predictor(model, MOAI.ReducedSpace(f), x)
@constraint(model, -75.0 .<= y .<= 75.0)
nl_model = MOI.Nonlinear.Model()
n_cons = 0
for con in JuMP.all_constraints(
model;
include_variable_in_set_constraints = false,
)
o = JuMP.constraint_object(con)
MOI.Nonlinear.add_constraint(nl_model, o.func, o.set)
n_cons += 1
end
variables = index.(all_variables(model))
evaluator = MOI.Nonlinear.Evaluator(
nl_model,
MOI.Nonlinear.SparseReverseMode(),
variables,
)
MOI.initialize(evaluator, [:Hess])
hessian_structure = MOI.hessian_lagrangian_structure(evaluator)
H = zeros(length(hessian_structure))
x = ones(length(variables))
σ = 1.0
μ = ones(n_cons)
function profiler()
x = rand(length(variables))
MOI.eval_hessian_lagrangian(evaluator, H, x, σ, μ)
return H
end
return profiler
end
profiler = make_model()
@time profiler();
using ProfileView
@profview profiler();
Nothing immediately jumps (if you will) out as a bottleneck. So I think this is the algorithm working as expected. It just doesn't like this particular example because of the A * x
and the dense Hessian.
Try AMPL via AmplNLWriter?
Good idea, will do. We exploit "defined variables" in the .nl writer?
(And shouldn't you be going to the airport soon?)
Waiting for my flight :)
We exploit "defined variables" in the .nl writer?
Nope. But they use a different AD algorithm, so it might help.
This is also the reason that I've added the recent GrayBox
support. We can treat the full NN as a user-defined function and compute derivatives across the full model---there's no need to represent the internals explicitly at the JuMP level.
Closing as won't fix for now. Performance issues are definitely on my radar. Slamming a NN at the reduced-space formulation like this is not ideal.
Probably due to the lack of common subexpressions, i.e. https://github.com/jump-dev/JuMP.jl/issues/3738.
For example:
I get:
Feel free to close this as this is a known issue, but just wanted to document that I've been hitting this bottleneck.