lanl-ansi / MathOptAI.jl

Embed trained machine learning predictors in JuMP
https://lanl-ansi.github.io/MathOptAI.jl/
Other
29 stars 1 forks source link

[breaking] return config struct from each layer #67

Closed odow closed 1 week ago

odow commented 3 months ago

As discussed with Russell and Kaarthik

odow commented 2 months ago

This is a breaking change because it changes the return of add_predictor. I'm going to wait a bit longer before adding (e.g., once the repo is ready to be made public and we've had a bit more feedback with the current design).

odow commented 2 months ago

So playing around with this. One option is:

struct ModelComponents
    variables::Vector{Any}
    constraints::Vector{Any}
end

function add_predictor(model::JuMP.Model, ::Tanh, x::Vector)
    y = JuMP.@variable(model, [1:length(x)], base_name = "moai_Tanh")
    _set_bounds_if_finite.(y, -1.0, 1.0)
    cons = JuMP.@constraint(model, y .== tanh.(x))
    return ModelComponents(y, cons), y
end

function add_predictor(::JuMP.Model, ::ReducedSpace{Tanh}, x::Vector)
    return ModelComponents(), tanh.(x)
end

with syntax

components, y = add_predictor(model, predictor, x)

Another option is

struct PredictionModel{T}
    output::Vector{T}
    variables::Vector{Any}
    constraints::Vector{Any}
end

function add_predictor(model::JuMP.Model, ::Tanh, x::Vector)
    y = JuMP.@variable(model, [1:length(x)], base_name = "moai_Tanh")
    _set_bounds_if_finite.(y, -1.0, 1.0)
    cons = JuMP.@constraint(model, y .== tanh.(x))
    return PredictionModel(y, y, cons)
end

function add_predictor(::JuMP.Model, ::ReducedSpace{Tanh}, x::Vector)
    return PredictionModel(tanh.(x), Any[], Any[])
end

with syntax

ml_model = add_predictor(model, predictor, x)
y = ml_model.output

Yet another option is:

y = add_predictor(model, predictor, x)
components, y = add_predictor(model, predictor, x; return_components = true)
odow commented 2 months ago

Thoughts @Robbybp?

Robbybp commented 2 months ago

The idea is that there is no overlap between output (or y) and variables? For now, I like wrapping everything into one struct.

Are you thinking that variables/constraints are completely unstructured, or somehow mimic the structure of the network (e.g. are indexed by nodes/layers)? Asking because I think this would be nice, but I'm not sure if it can/should be generalized.

odow commented 2 months ago

The idea is that there is no overlap between output (or y) and variables?

Nah, with overlap.

Perhaps each predictor should have a dedicated Result struct that documents how to inspect the formulation.

Robbybp commented 2 months ago

Personally, I'd prefer no overlap.

Perhaps each predictor should have a dedicated Result struct that documents how to inspect the formulation.

Maybe, although I could see this getting complicated quickly. Maybe a default Result struct that keeps internal variables and constraints in lists, with an option to return a "more structured" result if the predictor supports it?

odow commented 2 months ago

Personally, I'd prefer no overlap.

One issue is that the type of our result y can depend on the input, so I think there is value in having all variables in one place, and a list of the outputs (which may or may not be a new vector of variables).

Maybe a default Result struct that keeps internal variables and constraints in lists, with an option to return a "more structured" result if the predictor supports it?

Yip. That is exactly what this is: https://github.com/lanl-ansi/MathOptAI.jl/pull/80

odow commented 1 month ago

Here are @pulsipher's thoughts from #82

I definitely think it is a good idea to have access to the variables and constraints created for the predictor. This helps to demystify the transformations and I believe provides a way to delete a predictor (i.e., manually deleting the variables and constraints).

Here, are some of my thoughts on syntax to accomplish this. I prefer options 2 or 3.

1. Using the approach proposed in #80

I don't have any major issues with this approach, except that I think add_predictor should return y, formulation instead of formulation, y since I imagine the formulation will often not be used and the user would just want y, _ = add_predictor. Though, admittedly it might be a little annoying to have to deal with the 2nd argument when it is often not needed.

One side question would be why is Vector{Any} required for the variables field?

2. Tweaking #80 to return only one object

Instead of returning y and formulation, we could return a reformulation block of the form:

struct SimpleFormulation{P<:AbstractPredictor} <: AbstractFormulation
    predictor::P
    outputs::Array{Any} # new field for `y`
    variables::Vector{Any}
    constraints::Vector{Any}
end

Then the user can just extract the outputs y from the formulation object as wanted. Going one step further, one could even overload getindex such that:

Base.getindex(f::SimpleFormulation, inds...) = getindex(f.outputs, inds...)

3. Store the formulation info in the model and use references

Adding a little more complexity, we could store the formulation objects in model.ext and have add_predictor return a predictor reference object that points to it. Then the reference can be used to simplify the user experience in like manner to VariableRefs and ConstraintRefs (and similar to what DisjunctiveProgramming does with disjunctions). A rough API might look like:

predictor = add_predictor(model, nn, x)
y = outputs(predictor)
cons = transformation_constraints(predictor)
vars = transformation_variables(predictor)
predictor_obj = predictor_object(predictor)
set_transformation(predictor, TransformType()) # change the transformation used in-place
delete(model, predictor) # removes the predictor and anything it added to the model

Most of the above API could also be added with option 2. Moreover, we could also overload getindex to index the output variables.

odow commented 1 month ago

My thought for choosing option 1 is that most people don't actually want the formulation.

Most codes will do:

_, y = MathOptAI.add_predictor(model, predictor, x)
# or
y, _ = MathOptAI.add_predictor(model, predictor, x)
# or
y = first(MathOptAI.add_predictor(model, predictor, x))

I get that a single return simplifies things, but then I assume most people will immediately do:

formulation = MathOptAI.add_predictor(model, predictor, x)
y = formulation.output

This violates my design principle, and is one of the things I don't like about OMLT: https://github.com/lanl-ansi/MathOptAI.jl/blob/main/docs/src/developers/design_principles.md#omlt

  1. just seems like 2 with a layer of indirection. I don't think we really need it.

I think we can make delete work with option 1.

I'm not in a hurry to merge #80. At minimum, I'll wait until we can make the repo public and set up CI etc. It's a pretty big change to the library.

pulsipher commented 1 month ago

Another API idea would be to treat an added predictor as an operator-like object. For a NN this might look like:

NN = add_predictor(model, flux_chain) # creates a modelling object that can be treated as an operator
@objective(model, Min, sum(NN(x)))

where NN(x) generates the necessary expressions/constraints/variables as needed. Some sort of cache could be used to avoid making duplicate constraints/variables/expressions. Then that cache could be used to query NN for any desired information such as formulation constraints/variables.

In my research, we commonly use a single NN in an optimal control problem that steps forward in time. So the same NN is used over n-1 different variable inputs. The above API, would enable us to only have to "add" the predictor once.

What is nice about this approach is that NN(x) resembles how I often see this formulated mathematically and creating the model object this way more closely follows the typical JuMP workflow for creating expressions, parameters, variables, and nonlinear operators. Taking it one step further, you could even have a JuMP-like macro:

@predictor(model, NN, flux_chain)

 # containers of predictors could also be supported
@predictor(model, NN[i = 1:10], flux_chain[i])
odow commented 1 month ago

At one point I played around with y = NN(x) and y = NN(model, x). (I can't find it in the logs, so maybe I never pushed.)

I also considered the macros. But I don't know that they add much functionality. They're really just macros for the same of it.

I decided to go with add_predictor for simplicity.

I'm open to revisiting this in the future though, but it would require some large-scale practical examples that clearly demonstrate the benefit.

pulsipher commented 1 month ago

Enabling NN(x) could just be a matter of defining:

function (predictor::AbstractPredictor)(x::Vector)
    model = _get_model_or_error(x)
    return add_predictor(model, predictor, x)
end

where predictor can be created via build_predictor. This makes it more convenient to reuse the same predictor over different sets of inputs. For #80, we could have:

function (predictor::AbstractPredictor)(x::Vector)
    model = _get_model_or_error(x)
    return first(add_predictor(model, predictor, x))
end

which provides a natural way to avoid getting the formulation information if you don't want it.

I still would like to also avoid making redundant formulations by using the same nonlinear operator over a set of inputs and by using the same formulation if NN(x) is called more than once with the same inputs (admittedly, the last point is easy for the user to handle).

A large-scale example that exists in the optimal control community (and my own research) is an MPC formulation that uses a NN model that steps in time. The NN takes the states $x$ and control variables $u$ at time $tk$ and outputs the states at time $t{k+1}$. Hence the same NN is used $K-1$ times where $K$ is the number of time points. An ideal syntax for the NN portion of the optimal control formulation would be something like:

K = 100
raw_NN = # put raw Flux/Lux/PyTorch model here

@variable(model, x_lb[i] <= x[i =1:10, 0:K] <= x_ub[i])
@variable(model, u_lb[j] <= u[j = 1:2, 0:K] <= x_ub[j])

NN = build_predictor(raw_NN, gray_box = true)
@constraint(model, [k = 0:K-1], NN(vcat(x[:, k], u[:, k])) .== x[:, k+1])

where only one new nonlinear operator is created for NN over all the inputs it receives (this is point is probably best discussed in #90).

odow commented 1 week ago

Closing as won't-fix for now. Let's see how far our current setup gets us.