MOI Hessian evaluation is slow with reduced-space predictors

Robbybp commented 1 month ago

Probably due to the lack of common subexpressions, i.e. https://github.com/jump-dev/JuMP.jl/issues/3738.

For example:

using JuMP
import MathOptAI as MOAI
import MathOptInterface as MOI

function make_model(;dimen = 100, reduced_space = false)
    model = Model()

    input_dimen = 100
    hidden1_dimen = dimen
    hidden2_dimen = dimen
    output_dimen = 10

    @variable(model, x[1:100] >= 0)
    @objective(model, Min, sum(x.^2))

    A1 = ones(hidden1_dimen, input_dimen)
    b1 = ones(hidden1_dimen)
    A2 = ones(hidden2_dimen, hidden1_dimen)
    b2 = ones(hidden2_dimen)
    A3 = ones(output_dimen, hidden2_dimen)
    b3 = ones(output_dimen)

    f = MOAI.Pipeline(
        MOAI.Affine(-A1, b1),
        MOAI.Sigmoid(),
        MOAI.Affine(A2, b2),
        MOAI.Sigmoid(),
        MOAI.Affine(-A3, b3),
    )
    if reduced_space
        y = MOAI.add_predictor(model, MOAI.ReducedSpace(f), x)
    else
        y = MOAI.add_predictor(model, f, x)
    end

    @constraint(model, -75.0 .<= y .<= 75.0)

    return model
end

function time_hessian_eval(model)
    nlmod = MOI.Nonlinear.Model()
    cons = []
    for con in JuMP.all_constraints(model, include_variable_in_set_constraints = true)
        conobj = JuMP.constraint_object(con)
        MOI.Nonlinear.add_constraint(nlmod, conobj.func, conobj.set)
        push!(cons, con)
    end

    vars = JuMP.index.(JuMP.all_variables(model))
    evaluator = MOI.Nonlinear.Evaluator(nlmod, MOI.Nonlinear.SparseReverseMode(), vars)
    MOI.initialize(evaluator, [:Hess])

    hessian_structure = MOI.hessian_lagrangian_structure(evaluator)
    primals = ones(length(vars))
    duals = ones(length(cons))
    vals = zeros(length(hessian_structure))
    MOI.eval_hessian_lagrangian(evaluator, vals, primals, 1.0, duals)

    t0 = time()
    @time MOI.eval_hessian_lagrangian(evaluator, vals, primals, 1.0, duals)
    t_hess_vals = time() - t0

    return t_hess_vals
end

m_full = make_model(reduced_space = false)
m_reduced = make_model(reduced_space = true)

println("Full-space")
t_full_space_hess = time_hessian_eval(m_full)
println("Reduced-space")
t_reduced_space_hess = time_hessian_eval(m_reduced)

I get:

Full-space
  0.000036 seconds
Reduced-space
  1.663424 seconds (29 allocations: 464 bytes)

Feel free to close this as this is a known issue, but just wanted to document that I've been hitting this bottleneck.

odow commented 1 month ago

Yip. This doesn't surprise me in the slightest. Try AMPL via AmplNLWriter? (And shouldn't you be going to the airport soon?)

odow commented 1 month ago

I think the main problem is that we scalarize Ax + b, so dense matrices generate a lot of terms in the tape.

And in this case, the Hessian is dense.

But it's a good example for me to profile and understand, so let's leave this open.

odow commented 1 month ago

using JuMP
import MathOptAI as MOAI

function make_model(d = 100)
    model = Model()
    output_d = 10
    @variable(model, x[1:d] >= 0)
    @objective(model, Min, sum(x.^2))
    f = MOAI.Pipeline(
        MOAI.Affine(-ones(d, d), ones(d)),
        MOAI.Sigmoid(),
        MOAI.Affine(ones(d, d), ones(d)),
        MOAI.Sigmoid(),
        MOAI.Affine(-ones(output_d, d), ones(output_d)),
    )
    y = MOAI.add_predictor(model, MOAI.ReducedSpace(f), x)
    @constraint(model, -75.0 .<= y .<= 75.0)
    nl_model = MOI.Nonlinear.Model()
    n_cons = 0
    for con in JuMP.all_constraints(
        model;
        include_variable_in_set_constraints = false,
    )
        o = JuMP.constraint_object(con)
        MOI.Nonlinear.add_constraint(nl_model, o.func, o.set)
        n_cons += 1
    end
    variables = index.(all_variables(model))
    evaluator = MOI.Nonlinear.Evaluator(
        nl_model,
        MOI.Nonlinear.SparseReverseMode(),
        variables,
    )
    MOI.initialize(evaluator, [:Hess])
    hessian_structure = MOI.hessian_lagrangian_structure(evaluator)
    H = zeros(length(hessian_structure))
    x = ones(length(variables))
    σ = 1.0
    μ = ones(n_cons)
    function profiler()
        x = rand(length(variables))
        MOI.eval_hessian_lagrangian(evaluator, H, x, σ, μ)
        return H
    end
    return profiler
end

profiler = make_model()
@time profiler();
using ProfileView
@profview profiler();

Nothing immediately jumps (if you will) out as a bottleneck. So I think this is the algorithm working as expected. It just doesn't like this particular example because of the A * x and the dense Hessian.

Robbybp commented 1 month ago

Try AMPL via AmplNLWriter?

Good idea, will do. We exploit "defined variables" in the .nl writer?

(And shouldn't you be going to the airport soon?)

Waiting for my flight :)

odow commented 1 month ago

We exploit "defined variables" in the .nl writer?

Nope. But they use a different AD algorithm, so it might help.

This is also the reason that I've added the recent GrayBox support. We can treat the full NN as a user-defined function and compute derivatives across the full model---there's no need to represent the internals explicitly at the JuMP level.

odow commented 1 month ago

Closing as won't fix for now. Performance issues are definitely on my radar. Slamming a NN at the reduced-space formulation like this is not ideal.

lanl-ansi / MathOptAI.jl

MOI Hessian evaluation is slow with reduced-space predictors #102