Closed blegat closed 1 year ago
Thanks @blegat, at least at a high level this is sounding reasonable to me. Is this required to break the NLP block into separate MOI constraints or is there an intermediate step, where we first address that point?
The first step does not require it but resolving the whole issue requires it.
At a high level this seems fine. Don't forget that people want to query derivatives from a JuMP model without acting as a solver (http://www.juliaopt.org/JuMP.jl/v0.19.2/nlp/#Querying-derivatives-from-a-JuMP-model-1). It's not clear how that fits in with bridges.
The big details I'm worried about are the specification of the expression graph format. Using Julia expression graphs would be pretty inefficient because every term in the expression is a GC-tracked object.
From the perspective of SCIP (consuming expression graphs directly) this sounds good to me. I understand we would not need any bridge then, and could easily port the NLPBlock
based code to be used in direct_model
mode.
Is anyone currently working on this?
If so, I'll be happy to pitch in and collaborate on this. Otherwise, I'll start taking a pass at this. Just let me know.
No one is working on this. But this is a massive project to undertake (on the order of months).
We're working on getting funding to support a sustained development effort, but things are still uncertain due to COVID etc.
It won't happen before JuMP 1.0. If you have development time for NLP, it's probably better fixing some of the outstanding issues in Ipopt etc.
Migration of Pavito, Pajarito, Alpine to MOI are good near-term NLP activities.
@odow Entirely understood. I'll keep pushing out fixes as I come across them in the meantime.
That said my interest in this is somewhat selfish.
There’s a family of nonsmooth NLP solvers which require generalized derivative information (Lexicographic or Clarke type) that I’m getting interested in. So being able to specify a backend other than the standard AD scheme is of interest. I can do this currently by grabbing the NLPEvaluator from JuMP and manipulating it to construct a new evaluator. That said, this is a bit clunky (and prone to breaking when someone updates the AD in JuMP).
Also, I’ve also had a few people chat with me about wanting to extract computational tapes from JuMP models (for testing things akin forward-reverse interval contractors and so on…).
So, while this is definitely a long-term task, I’m interested in starting to get the MOI foundation for its setup.
@ccoffrin I’d be happy to do that if I ever have some genuinely free time 😊. That said, there are some research activities I’m looking at that may benefit from a more extendable nlp interface (which is motivating my interest in this).
prone to breaking when someone updates the AD in JuMP
This isn't going to change in the near-term.
being able to specify a backend other than the standard AD scheme is of interest
The first change on the AD in JuMP will be to enable this feature :)
Awesome! Thanks for the clarification.
I started a Google doc outlining the high-level tasks required for a re-write of NLP support in JuMP.
https://docs.google.com/document/d/1SfwFrIhLyzpRdXyHKMFKv_XKeyJ6mi6kvhnUZ2LEEjM/edit?usp=sharing
Comments and suggestions for a more detailed design are appreciated.
It has been on my TODO for literally 6 months to talk to you and workout how the ChainRules project fits into this story. It might that it just doesn't, but it feels like it might.
@oxinabox I think if the plan to make a more generic derivative back-end works out then ChainRules could be a part of that! The goal would be to have a more generic differentiation interface so users can pick solutions that well suited to their specific use case.
@blegat @odow I was thinking of next steps for the CP sets, and I was wondering how to support nonlinear functions (like absolute value). Based on the current thoughts about this NLP issue, do you think adding AbsoluteValueFunction <: AbstractScalarFunction
makes sense (with the only argument being an AbstractScalarFunction
)? In the sense that this development should be useful with expression rewriting as it's currently being planned. Do you propose another kind of design for this functionality, or is it just too soon for such questions? Thanks in advance!
I don't see the harm in defining AbsoluteValueFunction
and seeing where it takes us. It makes sense to explore the design space.
There are no real plans for other expression types. One option is AMPL-esque: https://github.com/jump-dev/AmplNLWriter.jl/blob/60b147e83d0c3801f1e565828b3ec486848a3b65/src/NLModel.jl#L248-L277. Another option is to investigate ModelingToolkit. The main requirement is that it supports functions with ~1e6 terms.
OK, thanks! At least, such a solution could fit better with bridges, I think.
Another option is to investigate ModelingToolkit. The main requirement is that it supports functions with ~1e6 terms.
There's a few demos already well beyond that 😅 . We need symbolic array support to be completed to make a few things nicer though. @blegat added an MOI binding to GalacticOptim.jl so we'll be making use of this all in the background.
@ChrisRackauckas I had initially not studied ModelingToolkit a lot, because it does not seem to be well-equipped for discreteness. I have listed five pain points for now. Maybe they could be solved by using Symbolics.jl instead of ModelingToolkit.jl?
How do you make something like this work? This could be the way a user enters a count
constraint.
@parameters t σ ρ β
sum([t, σ] .~ [ρ, β])
For now, this code yields the following error:
ERROR: MethodError: no method matching +(::Equation, ::Equation)
Closest candidates are:
+(::Any, ::Any, ::Any, ::Any...) at operators.jl:560
+(::ChainRulesCore.AbstractThunk, ::Any) at C:\Users\Thibaut\.julia\packages\ChainRulesCore\GciYT\src\differential_arithmetic.jl:138
+(::ChainRulesCore.Tangent{P, T} where T, ::P) where P at C:\Users\Thibaut\.julia\packages\ChainRulesCore\GciYT\src\differential_arithmetic.jl:162
...
Stacktrace:
[1] add_sum(x::Equation, y::Equation)
@ Base .\reduce.jl:24
[2] _mapreduce
@ .\reduce.jl:408 [inlined]
[3] _mapreduce_dim
@ .\reducedim.jl:318 [inlined]
[4] #mapreduce#672
@ .\reducedim.jl:310 [inlined]
[5] mapreduce
@ .\reducedim.jl:310 [inlined]
[6] #_sum#682
@ .\reducedim.jl:878 [inlined]
[7] _sum
@ .\reducedim.jl:878 [inlined]
[8] #_sum#681
@ .\reducedim.jl:877 [inlined]
[9] _sum
@ .\reducedim.jl:877 [inlined]
[10] #sum#679
@ .\reducedim.jl:873 [inlined]
[11] sum(a::Vector{Equation})
@ Base .\reducedim.jl:873
[12] top-level scope
@ REPL[13]:1
Also, not all constraints are equality or inequality: you might have alldifferent(t, σ, ρ, β)
as a constraint. In a sense, this is similar to alldifferent(t, σ, ρ, β) == true
, but the latter uses reification (which is more cumbersome to implement in the solvers). I guess this might be solved by adding a new field in OptimizationSystem
for global constraints. Also, this interface does not allow yet for integer or Boolean variables, but I guess this should not be too hard to add.
At its core, ModelingToolkit seems to be excellent for computing derivatives, but what about functions that do not have a derivative? (At least not in a classical sense…) How does the package work when there is no way to compute a derivative? How strong is the differentiability hypothesis?
(Also, I guess it would be useful to deal with functions that are not differentiable everywhere and to compute subgradients.)
I didn't see a lot of ways to add new rules for rewriting expressions in ModelingToolkit.jl, only in Symbolics.jl. However, to get to optimisation solver, it's not sufficient to simplify expressions, but rather to get to a form that the targeted solver allows (like second-order cones and linear constraints). Is there a way to write this kind of transformation on top of ModelingToolkit.jl/Symbolics.jl, i.e. without throwing tens of thousands of lines of code at it? (For instance, MOI already has an excellent solution to this problem with bridges.)
Also, you mention that GalacticOptim.jl has MOI support, but only for the current, legacy NLP support: the current code does not seem to be able to extract information about the problem, like linear constraints. I bet that such a system would be more complex to write.
You might want to keep some values in your constraints variable, for instance to reoptimise quickly when just a few values change. (https://github.com/jump-dev/ParametricOptInterface.jl/blob/master/docs/src/example.md) Maybe this can be solved by having the user specify what parameters are optimisation variables and which ones are not. The problem that arises with this kind of parameters is that the product of two optimisation variables is unlikely to be convex, but if one of them is fixed then the expression is linear. This kind of information should be available to the rewriting system.
Maybe they could be solved by using Symbolics.jl instead of ModelingToolkit.jl?
ModelingToolkit.jl is built on Symbolics.jl. Anything that Symbolics.jl can represent can go into the ModellingToolkit.jl types and give a modeling frontend to.
How do you make something like this work? This could be the way a user enters a count constraint.
The symbolic system is as confused as I am. What are you trying to do there? Build an equation summing the left and right sides? We probably could specialize mapreduce
expressions but I haven't seen anyone want it.
you might have alldifferent(t, σ, ρ, β) as a constraint.
Yeah, we'd have to add that expression.
this interface does not allow yet for integer or Boolean variables
using Symbolics
@variables a::Int b::Bool
At its core, ModelingToolkit seems to be excellent for computing derivatives, but what about functions that do not have a derivative? (At least not in a classical sense…) How does the package work when there is no way to compute a derivative? How strong is the differentiability hypothesis?
It uses the same rule expressions as the AD system, so it will handle ifelse
, max
, abs
, etc. in exactly the same cases in the same expression generating way. Part of what we're doing in AD systems soon is subgradients and absnormal forms, so those same rules would carry over (by the same people too).
I didn't see a lot of ways to add new rules for rewriting expressions in ModelingToolkit.jl, only in Symbolics.jl
Good, because there's no such thing as a ModelingToolkit.jl expression. All expressions are Symbolics.jl expressions.
but rather to get to a form that the targeted solver allows (like second-order cones and linear constraints). Is there a way to write this kind of transformation on top of ModelingToolkit.jl/Symbolics.jl, i.e. without throwing tens of thousands of lines of code at it?
Yeah, this is rather trivial really. It's definitely a good application of E-graphs. I do have a student looking into this, though I think the main focus of GalacticOptim.jl right now is really on nonlinear support since that's the bulk of the use cases.
Also, you mention that GalacticOptim.jl has MOI support, but only for the current, legacy NLP support: the current code does not seem to be able to extract information about the problem, like linear constraints. I bet that such a system would be more complex to write.
islinear
and use jacobian
. It's no different from other cases where linearity is specialized on. I'd need @blegat's help to actually make that connect to MOI, but getting that kind symbolic information is trivial 2 lines of code stuff.
Not all parameters are optimisation variables
Look at the definition of OptimizationProblem. There are states (optimized for) and parameters (not optimized for).
Thanks for your detailed answer! Regarding the sum
syntax, it's really about mimicking what Julia does with native number:
t = 1
σ = 2
ρ = 1
β = 4
[t, σ] .== [ρ, β] # BitVector: [true, false]
sum([t, σ] .== [ρ, β]) # 1: one of the two equalities hold
Yes, that makes sense. Open an issue on Symbolics.jl?
Reposted from https://github.com/jump-dev/MathOptInterface.jl/issues/1538 Modified to add in a point about parameters given https://jump.dev/JuMP.jl/stable/manual/nlp/#Create-a-nonlinear-parameter
@frapac from our discussion:
Modeling languages are great, but for many nonlinear problems you just want to provide an objective and constraints to the solvers with minimal hassles. In economics, in particular, the equations might be impossible to even write in existing modeling langauges. In other cases, maybe you want to use all sorts of Symbolics.jl/ModelingTookit.jl trickery, but in the end have a function which just operates on component arrays.
Basically, you just want to call the optimizers with as native of primitives as possible and without any DSL getting in the way. This is in the same spirit as https://github.com/jump-dev/MatrixOptInterface.jl I believe.
In its current form, JuMP for the NLP is very scalar centric, and this was one of the things that led to the creation of https://github.com/SciML/GalacticOptim.jl (note, however, that the hope was that the scope of GalacticOptim will eventually expand into far more interesting things largely around applying problem reduction as well as other fancy AD things).
On the MOI side, there might not be that many things missing for a more direct pass-though. Probably need dense jacobians and hessians, and I think that anything that needs to allocate will be a problem if it wants to do passthrough to GPU-style algorithms, and it might make passthrough of things like componentarrays difficult. But that stuff could come later.
What might this look like to a user?
First, it is useful to have an object to collect derivatives and make different AD convenient. I will describe the ones from GalacticOptim, but the gradient wrappers are something that could be standardized and shared in SciMLBase or just made specific to JuMP/MOI. For example, https://github.com/SciML/GalacticOptim.jl/tree/master/src/function provides some wrappers for all sorts of AD. It could do something like the following.
Write your julia function. SciML allows allows passing in a parameter vector p
which would be nice, but could be done with just a closure if that doesn't fit into JuMP very well.
rosenbrock(x, p) = (p[1] - x[1])^2 + p[2] * (x[2] - x[1]^2)^2
f = OptimizationFunctions(rosenbrock)
Which would dispatch to the default auto-differentiation. Or to use zygote or finite differences
OptimizationFunction(rosenbrock, GalacticOptim.AutoZygote())
OptimizationFunction(rosenbrock, GalacticOptim.AutoFiniteDiff())
Or to pass in your own derviatives
rosenbrock_grad(x, p) = 2 p[1] * x[1]+ # WHATEVER IT IS
OptimizationFunction(rosenbrock, rosenbrock_grad)
Finally, if the problem was constrained, then in the current GalacticOptim (which could be changed) it can generate the jacobian and hessians as required. The current design has an optional cons
but I think it would be much nicer if these were decoupled. Anyways, not essential either way.
rosenbrock(x, p) = (p[1] - x[1])^2 + p[2] * (x[2] - x[1]^2)^2
function con2_c(x,p)
[x[1]^2 + x[2]^2,
x[2]*sin(x[1])-x[1]]
end
OptimizationFunction(rosenbrock, GalacticOptim.AutoForwardDiff();cons= con2_c)
First consider if there were MOI exnteions instead. @frapac suggested something like
model = MOI.nlpsolve(f, g!, h!,p, x_lb, x_ub, c_lb, c_ub)
attach_linear_constraints!(model, A, b)
attach_quadratic_constraints!(model, Q, q, d)
which I would hope could have a "differentiation wrapper" version:
p = # the parameters
fun = OptimizationFunction(rosenbrock, GalacticOptim.AutoForwardDiff();cons= con2_c)
model = MOI.nlpsolve(fun, p, x_lb, x_ub, c_lb, c_ub)
attach_linear_constraints!(model, A, b)
attach_quadratic_constraints!(model, Q, q, d)
Note that I added in parameters in the vector p
to be in the same spriit as parametesr in Jump
In fact, if this existed, then I think GalacticOptim could be extremly short and almost not exist. Or it could just be the OptimizationFunctions
wrapper. But I would hope that we can still do all of the linear constraints, complementarity constraints, etc.
The last piece of GalacticOptim that might be worth considering is its role in auto-differentiation rules for the optimizeres themselves. We want to be able to establish AD rules for optimizers themselves (e.g. https://en.wikipedia.org/wiki/Envelope_theorem) but that might be better discussed down the line.
To be rrule friendly, we would probably want to avoid the mutatino of adding constraints ex-post, and instead have them in the constructor of the problem type. Mutation makes for difficult rules.
From the "DSL" perspective, I think that JuMP might actually work with a few constrained on supported features. Although I think I would preefer the MOI-extension instead.
In particular, if you want to provide a "vectorized" objective (and optionally nonlinear constraint) then:
Otherwise, I think that the existing JuMP and MOI stuff around linear constraints would be perfectly fine. The user would only have a single variables to work with, but I think that is fine.
Alternatively, if you wanted this to live in JuMP then, to see possible usage in a one-variable limited jump, let me adapt https://jump.dev/JuMP.jl/stable/tutorials/Nonlinear%20programs/rosenbrock/
using JuMP
import Ipopt
import Test
import AutoDiffWrappersFromWherever
function example_rosenbrock()
model = Model(Ipopt.Optimizer)
set_silent(model)
# call wrappers discussed above
rosenbrock(x) = (1 - x[1])^2 + 100 * (x[2] - x[1]^2)^2
obj = OptimizationFunctions(rosenbrock, AutoFiniteDiff())
@variable(model, x[1:2]) # or @passthrough_variable(model, x[1,2])
@NLvector_objective(model, Min, obj)
# ERRORS IF MODEL HAS MORE THAN ONE VARIABLE OR IF SCALAR
optimize!(model)
Test.@test termination_status(model) == MOI.LOCALLY_SOLVED
Test.@test primal_status(model) == MOI.FEASIBLE_POINT
Test.@test objective_value(model) ≈ 0.0 atol = 1e-10
Test.@test value(x) ≈ 1.0
Test.@test value(y) ≈ 1.0
return
end
function example_constrained_rosenbrock()
model = Model(Ipopt.Optimizer)
set_silent(model)
# call wrappers discussed above
rosenbrock(x) = (1 - x[1])^2 + 100 * (x[2] - x[1]^2)^2
function con2_c(x)
[x[1]^2 + x[2]^2,
x[2]*sin(x[1])-x[1]]
end
obj = OptimizationFunctions(rosenbrock, AutoFiniteDiff(), cons = con2_c)
@variable(model, x[1:2])
@NLvector_objective(model, Min, obj)
# THIS ADDS THE CONSTRAINT AS WELL. HESSIAN OF LAGRANGIAN ATTACHED
optimize!(model)
Test.@test termination_status(model) == MOI.LOCALLY_SOLVED
Test.@test primal_status(model) == MOI.FEASIBLE_POINT
Test.@test objective_value(model) ≈ 0.0 atol = 1e-10
Test.@test value(x) ≈ 1.0
Test.@test value(y) ≈ 1.0
return
end
What would @NLvector_objective
do? Basically just register the objective and potentially the constraints in their most raw form when providing the implementation for MOI calls. It can access the OptimizationFunctions
structure for the function calls and just call them directly.
ScalarNonlinearFunction is merged. I'll hold off closing this until we add VectorNonlinearFunction.
Closing as done. We know have ScalarNonlinearFunction
and VectorNonlinearFunction
.
We discussed with @ccoffrin, @harshangrjn and @kaarthiksundar about what the next-generation NLP could look like. Here is a gist of the idea: We create two new function types. First
NonlinearExpressionFunction
which will be used by JuMP to give the objective function and the constraints (asNonlinearExpressionFunction
-in-EqualTo
/GreaterThan
/LessThan
) to the MOI backend.Then
SecondOrderBlackBoxFunction
will be what NLP solvers such as Ipopt supports:But NLP solvers such as Alpine can directly support
NonlinearExpressionFunction
as it exploits the expression structure.We can then define the bridges
AffinetoSecondOrderBlackBox
andQuadratictoSecondOrderBlackBoxBridge
which transforms respectivelyScalarAffineFunction
andScalarQuadraticFunction
intoSecondOrderBlackBoxFunctionBridge
(as is currently done in the Ipopt MOI wrapper for instance).The transformation from
NonlinearExpressionFunction
toSecondOrderBlackBoxFunction
can be achieved by aAutomaticDifferentiationBridge{ADBackend}
which computes the callbacks usingADBackend
. We can move the current AD implementation form JuMP toMOI.Utilities
into anDefaultADBackend
andfull_bridge_optimizer
addsAutomaticDifferentiationBridge{DefaultADBackend}
. If the user wants to change the AD backend toOtherADBackend
, there will be a JuMP function that internally removesAutomaticDifferentiationBridge{DefaultADBackend}
and addsAutomaticDifferentiationBridge{OtherADBackend}
.@kaarthiksundar would be interested to work on this and we thought a good first step is to define
SecondOrderBlackBoxFunction
and create the bridgesAffinetoSecondOrderBlackBox
andQuadratictoSecondOrderBlackBoxBridge
.Let us know what you think !