lanl-ansi / MathOptAI.jl

Embed trained machine learning predictors in JuMP
https://lanl-ansi.github.io/MathOptAI.jl/
Other
29 stars 1 forks source link

Add extension point for `AbstractModel`s with `AbstractVariableRef`s #92

Open pulsipher opened 1 month ago

pulsipher commented 1 month ago

Related to #83, JuMP.AbstractModels like InfiniteOpt need to control how variables are generated during reformulation.

Consider for instance, a two-stage stochastic program that uses a NN predictor model where we would like to do:

using InfiniteOpt, Distributions, MathOptAI, Flux

# Make a toy NN model
NN = Chain(Dense(2 => 3, relu))

# Setup
model = InfiniteModel()
@infinite_parameter(model, ξ ~ Uniform(0, 1))
@variable(model, x[1:2], Infinite(ξ)) # a 2nd stage input variable

# NN model
y = add_predictor(model, NN, x)

We would want y to be y(ξ) since the input is x(ξ), but instead 1st stage variables y are returned since the Infinite(ξ) tag is not used in @variable behind the scenes. Interestingly, this problem is avoided with reduced space reformulations

So, what I would need is some way to appropriately add tags to reformulation variables. Unfortunately, this might mean adding an extension point for each add_predictor method since their relationships between the inputs and the generated variables vary.

Alternatively, I suppose I could overload every add_predictor method for InfiniteModels, but then I would end up having to copy nearly all the code (only tweaking how the variables are generated).

odow commented 1 month ago

Alternatively, I suppose I could overload every add_predictor method for InfiniteModels, but then I would end up having to copy nearly all the code (only tweaking how the variables are generated).

It's unfortunate. But this would be my suggestion.

Could you give an example of what you would require for the Affine and ReLU layers?

I am very reticent to start making things abstract before we absolutely need to.

odow commented 1 month ago

I think this should work now if you do y = add_predictor(model, NN, x; gray_box = true, reduced_space = true)?

pulsipher commented 1 month ago

I think this should work now if you do y = add_predictor(model, NN, x; gray_box = true, reduced_space = true)?

It should work with reduced_space = true regardless of whether gray_box is used. The issue is getting it to work with full space formulations (especially ones that don't have reduced space alternatives), namely all the different ReLU reformulations such as big-M.

It's unfortunate. But this would be my suggestion.

This would indeed be unfortunate since any changes made to any of the predictors would have to be made in two places which has been quite annoying on the OMLT team.

Could you give an example of what you would require for the Affine and ReLU layers?

For activation functions, I think only single extension point would be needed for variable generation:

function add_activation_variable(model::JuMP.AbstractModel, xi, lb, ub; kwargs...)
    yi = JuMP.@variable(model; kwargs...)
    _set_bounds_if_finite(yi, lb, ub)
    return yi
end

In the InfiniteOpt extension, I would add something of the form:

function add_activation_variable(model::InfiniteOpt.InfiniteModel, xi, lb, ub; kwargs...)
    prefs = InfiniteOpt.parameter_refs(xi) # queries the infinite parameters an input may depend on
    if isempty(prefs) # is the input finite?
        yi = JuMP.@variable(model; kwargs...)
    else
        yi = JuMP.@variable(model; variable_type = InfiniteOpt.Infinite(prefs...), kwargs...)
    end
    _set_bounds_if_finite(yi, lb, ub) # InfiniteOpt's version of this function (not an extension)
    return yi
end

The add_predictor function for ReLU and ReLUBigM would then be:

function add_predictor(model::JuMP.AbstractModel, ::ReLU, x::Vector)
    ub = last.(get_variable_bounds.(x))
    y = [add_activation_variable(model, x[i], 0, ub[i], base_name = "moai_ReLU[$i]") for i in eachindex(x)]
    JuMP.@constraint(model, y .== max.(0, x))
    return y
end

function add_predictor(model::JuMP.AbstractModel, predictor::ReLUBigM, x::Vector)
    m = length(x)
    bounds = get_variable_bounds.(x)
    y = Vector{JuMP.variable_ref_type(model)}(undef, m)
    for i in 1:m
        y[i] = add_activation_variable(model, x[i], 0, last(bounds[i]), base_name = "moai_ReLU[$i]")
        lb, ub = bounds[i]
        z = add_activation_variable(model, x[i], nothing, nothing, binary = true)
        JuMP.@constraint(model, y[i] >= x[i])
        U = min(ub, predictor.M)
        JuMP.@constraint(model, y[i] <= U * z)
        L = min(max(0, -lb), predictor.M)
        JuMP.@constraint(model, y[i] <= x[i] + L * (1 - z))
    end
    return y
end

This API would also readily be compatible with the other activation functions, binary decision trees and quantiles. Note that this also makes get_variable_bounds an extension point so I could make it work with InfiniteOpt.GeneralVariableRefs.

So then, the only layer that requires special treatment for full space is Affine since each output is a linear combination of all the inputs which means that all outputs become infinite if any single input is. An API might look something like:

# Variable extension point
function add_affine_variable(model::JuMP.AbstractModel, x, lb, ub; kwargs...)
    yi = JuMP.@variable(model; kwargs...)
    _set_bounds_if_finite(yi, lb, ub)
    return yi
end

# Method overloaded in the InfiniteOpt extension
function add_affine_variable(model::InfiniteOpt.InfiniteModel, x, lb, ub; kwargs...)
    prefs = InfiniteOpt.parameter_refs(x)
    if isempty(prefs) # are all of the inputs finite?
        yi = JuMP.@variable(model; kwargs...)
    else
        yi = JuMP.@variable(model; variable_type = InfiniteOpt.Infinite(prefs...), kwargs...)
    end
    _set_bounds_if_finite(yi, lb, ub) # InfiniteOpt's version of this function
    return yi
end

function add_predictor(model::JuMP.AbstractModel, predictor::Affine, x::Vector)
    m = size(predictor.A, 1)
    y = Vector{JuMP.variable_ref_type(model)}(undef, m)
    bounds = get_variable_bounds.(x)
    for i in 1:size(predictor.A, 1)
        y_lb, y_ub = predictor.b[i], predictor.b[i]
        for j in 1:size(predictor.A, 2)
            a_ij = predictor.A[i, j]
            lb, ub = bounds[j]
            y_ub += a_ij * ifelse(a_ij >= 0, ub, lb)
            y_lb += a_ij * ifelse(a_ij >= 0, lb, ub)
        end
        y[i] = add_affine_variable(model, x, y_lb, y_ub, base_name = "moai_Affine[$i]")
    end
    JuMP.@constraint(model, predictor.A * x .+ predictor.b .== y)
    return y
end
odow commented 1 month ago

So this is actually one reason why I didn't want to support AbstractModel in the signatures. It is exactly the laissez-faire issue of https://github.com/lanl-ansi/MathOptAI.jl/blob/main/docs/src/developers/design_principles.md#inputs-are-vectors

Could you use the ReducedSpace implementation in some way?

function add_predictor(model::InfiniteModel, predictor::AbstractPredictor, x::Vector)
    y_expr = add_predictor(model, ReducedSpace(predictor), x)
    y = @variable(model, [1:length(y)])
    @constraint(model, y .== y_expr)
    return y
end
pulsipher commented 1 month ago

Could you use the ReducedSpace implementation in some way?

Yes, but only for layers that support ReducedSpace. The core issue here is supporting layers that don't such as ReLUBigM, ReLUSOS1, ReLUQuadratic, and BinaryDecisionTree which all need to create variables as part of their formulations.

For these, having an overloadable function like add_activation_variable and get_variable_bounds would provide a way to control how variables are created. This API could be clearly documented as being intended for only extension writers of JuMP.AbstractModels and should be done with caution.