blegat commented 5 years ago

We discussed with @ccoffrin, @harshangrjn and @kaarthiksundar about what the next-generation NLP could look like. Here is a gist of the idea: We create two new function types. First NonlinearExpressionFunction which will be used by JuMP to give the objective function and the constraints (as NonlinearExpressionFunction-in-EqualTo/GreaterThan/LessThan) to the MOI backend.

struct NonlinearExpressionFunction <: AbstractScalarFunction
    expression
end

Then SecondOrderBlackBoxFunction will be what NLP solvers such as Ipopt supports:

struct SecondOrderBlackBoxFunction <: AbstractScalarFunction
    evaluation_oracle::Function
    gradient_oracle::Function # May need to replace it by `append_to_jacobian_sparsity!` and `fill_constraint_jacobian!`
    hessian_oracle::Function
end

But NLP solvers such as Alpine can directly support NonlinearExpressionFunction as it exploits the expression structure.

We can then define the bridges AffinetoSecondOrderBlackBox and QuadratictoSecondOrderBlackBoxBridge which transforms respectively ScalarAffineFunction and ScalarQuadraticFunction into SecondOrderBlackBoxFunctionBridge (as is currently done in the Ipopt MOI wrapper for instance).

The transformation from NonlinearExpressionFunction to SecondOrderBlackBoxFunction can be achieved by a AutomaticDifferentiationBridge{ADBackend} which computes the callbacks using ADBackend. We can move the current AD implementation form JuMP to MOI.Utilities into an DefaultADBackend and full_bridge_optimizer adds AutomaticDifferentiationBridge{DefaultADBackend}. If the user wants to change the AD backend to OtherADBackend, there will be a JuMP function that internally removes AutomaticDifferentiationBridge{DefaultADBackend} and adds AutomaticDifferentiationBridge{OtherADBackend}.

@kaarthiksundar would be interested to work on this and we thought a good first step is to define SecondOrderBlackBoxFunction and create the bridges AffinetoSecondOrderBlackBox and QuadratictoSecondOrderBlackBoxBridge.

Let us know what you think !

ccoffrin commented 5 years ago

Thanks @blegat, at least at a high level this is sounding reasonable to me. Is this required to break the NLP block into separate MOI constraints or is there an intermediate step, where we first address that point?

blegat commented 5 years ago

The first step does not require it but resolving the whole issue requires it.

mlubin commented 5 years ago

At a high level this seems fine. Don't forget that people want to query derivatives from a JuMP model without acting as a solver (http://www.juliaopt.org/JuMP.jl/v0.19.2/nlp/#Querying-derivatives-from-a-JuMP-model-1). It's not clear how that fits in with bridges.

The big details I'm worried about are the specification of the expression graph format. Using Julia expression graphs would be pretty inefficient because every term in the expression is a GC-tracked object.

rschwarz commented 5 years ago

From the perspective of SCIP (consuming expression graphs directly) this sounds good to me. I understand we would not need any bridge then, and could easily port the NLPBlock based code to be used in direct_model mode.

mewilhel commented 4 years ago

Is anyone currently working on this?

If so, I'll be happy to pitch in and collaborate on this. Otherwise, I'll start taking a pass at this. Just let me know.

odow commented 4 years ago

No one is working on this. But this is a massive project to undertake (on the order of months).

We're working on getting funding to support a sustained development effort, but things are still uncertain due to COVID etc.

It won't happen before JuMP 1.0. If you have development time for NLP, it's probably better fixing some of the outstanding issues in Ipopt etc.

ccoffrin commented 4 years ago

Migration of Pavito, Pajarito, Alpine to MOI are good near-term NLP activities.

mewilhel commented 4 years ago

@odow Entirely understood. I'll keep pushing out fixes as I come across them in the meantime.

That said my interest in this is somewhat selfish.

There’s a family of nonsmooth NLP solvers which require generalized derivative information (Lexicographic or Clarke type) that I’m getting interested in. So being able to specify a backend other than the standard AD scheme is of interest. I can do this currently by grabbing the NLPEvaluator from JuMP and manipulating it to construct a new evaluator. That said, this is a bit clunky (and prone to breaking when someone updates the AD in JuMP).

Also, I’ve also had a few people chat with me about wanting to extract computational tapes from JuMP models (for testing things akin forward-reverse interval contractors and so on…).

So, while this is definitely a long-term task, I’m interested in starting to get the MOI foundation for its setup.

@ccoffrin I’d be happy to do that if I ever have some genuinely free time 😊. That said, there are some research activities I’m looking at that may benefit from a more extendable nlp interface (which is motivating my interest in this).

odow commented 4 years ago

prone to breaking when someone updates the AD in JuMP

This isn't going to change in the near-term.

being able to specify a backend other than the standard AD scheme is of interest

The first change on the AD in JuMP will be to enable this feature :)

mewilhel commented 4 years ago

Awesome! Thanks for the clarification.

odow commented 4 years ago

I started a Google doc outlining the high-level tasks required for a re-write of NLP support in JuMP.

https://docs.google.com/document/d/1SfwFrIhLyzpRdXyHKMFKv_XKeyJ6mi6kvhnUZ2LEEjM/edit?usp=sharing

Comments and suggestions for a more detailed design are appreciated.

oxinabox commented 4 years ago

It has been on my TODO for literally 6 months to talk to you and workout how the ChainRules project fits into this story. It might that it just doesn't, but it feels like it might.

ccoffrin commented 4 years ago

@oxinabox I think if the plan to make a more generic derivative back-end works out then ChainRules could be a part of that! The goal would be to have a more generic differentiation interface so users can pick solutions that well suited to their specific use case.

dourouc05 commented 3 years ago

@blegat @odow I was thinking of next steps for the CP sets, and I was wondering how to support nonlinear functions (like absolute value). Based on the current thoughts about this NLP issue, do you think adding AbsoluteValueFunction <: AbstractScalarFunction makes sense (with the only argument being an AbstractScalarFunction)? In the sense that this development should be useful with expression rewriting as it's currently being planned. Do you propose another kind of design for this functionality, or is it just too soon for such questions? Thanks in advance!

odow commented 3 years ago

I don't see the harm in defining AbsoluteValueFunction and seeing where it takes us. It makes sense to explore the design space.

There are no real plans for other expression types. One option is AMPL-esque: https://github.com/jump-dev/AmplNLWriter.jl/blob/60b147e83d0c3801f1e565828b3ec486848a3b65/src/NLModel.jl#L248-L277. Another option is to investigate ModelingToolkit. The main requirement is that it supports functions with ~1e6 terms.

dourouc05 commented 3 years ago

OK, thanks! At least, such a solution could fit better with bridges, I think.

ChrisRackauckas commented 3 years ago

Another option is to investigate ModelingToolkit. The main requirement is that it supports functions with ~1e6 terms.

There's a few demos already well beyond that 😅 . We need symbolic array support to be completed to make a few things nicer though. @blegat added an MOI binding to GalacticOptim.jl so we'll be making use of this all in the background.

dourouc05 commented 3 years ago

@ChrisRackauckas I had initially not studied ModelingToolkit a lot, because it does not seem to be well-equipped for discreteness. I have listed five pain points for now. Maybe they could be solved by using Symbolics.jl instead of ModelingToolkit.jl?

1. Counting

How do you make something like this work? This could be the way a user enters a count constraint.

@parameters t σ ρ β
sum([t, σ] .~ [ρ, β])

For now, this code yields the following error:

ERROR: MethodError: no method matching +(::Equation, ::Equation)
Closest candidates are:
  +(::Any, ::Any, ::Any, ::Any...) at operators.jl:560
  +(::ChainRulesCore.AbstractThunk, ::Any) at C:\Users\Thibaut\.julia\packages\ChainRulesCore\GciYT\src\differential_arithmetic.jl:138
  +(::ChainRulesCore.Tangent{P, T} where T, ::P) where P at C:\Users\Thibaut\.julia\packages\ChainRulesCore\GciYT\src\differential_arithmetic.jl:162
  ...
Stacktrace:
  [1] add_sum(x::Equation, y::Equation)
    @ Base .\reduce.jl:24
  [2] _mapreduce
    @ .\reduce.jl:408 [inlined]
  [3] _mapreduce_dim
    @ .\reducedim.jl:318 [inlined]
  [4] #mapreduce#672
    @ .\reducedim.jl:310 [inlined]
  [5] mapreduce
    @ .\reducedim.jl:310 [inlined]
  [6] #_sum#682
    @ .\reducedim.jl:878 [inlined]
  [7] _sum
    @ .\reducedim.jl:878 [inlined]
  [8] #_sum#681
    @ .\reducedim.jl:877 [inlined]
  [9] _sum
    @ .\reducedim.jl:877 [inlined]
 [10] #sum#679
    @ .\reducedim.jl:873 [inlined]
 [11] sum(a::Vector{Equation})
    @ Base .\reducedim.jl:873
 [12] top-level scope
    @ REPL[13]:1

2. Global constraints for optimisation

Also, not all constraints are equality or inequality: you might have alldifferent(t, σ, ρ, β) as a constraint. In a sense, this is similar to alldifferent(t, σ, ρ, β) == true, but the latter uses reification (which is more cumbersome to implement in the solvers). I guess this might be solved by adding a new field in OptimizationSystem for global constraints. Also, this interface does not allow yet for integer or Boolean variables, but I guess this should not be too hard to add.

3. Lack of differentiability for some functions

At its core, ModelingToolkit seems to be excellent for computing derivatives, but what about functions that do not have a derivative? (At least not in a classical sense…) How does the package work when there is no way to compute a derivative? How strong is the differentiability hypothesis?

(Also, I guess it would be useful to deal with functions that are not differentiable everywhere and to compute subgradients.)

4. Fancy rule-based rewriting

I didn't see a lot of ways to add new rules for rewriting expressions in ModelingToolkit.jl, only in Symbolics.jl. However, to get to optimisation solver, it's not sufficient to simplify expressions, but rather to get to a form that the targeted solver allows (like second-order cones and linear constraints). Is there a way to write this kind of transformation on top of ModelingToolkit.jl/Symbolics.jl, i.e. without throwing tens of thousands of lines of code at it? (For instance, MOI already has an excellent solution to this problem with bridges.)

Also, you mention that GalacticOptim.jl has MOI support, but only for the current, legacy NLP support: the current code does not seem to be able to extract information about the problem, like linear constraints. I bet that such a system would be more complex to write.

5. Not all parameters are optimisation variables

You might want to keep some values in your constraints variable, for instance to reoptimise quickly when just a few values change. (https://github.com/jump-dev/ParametricOptInterface.jl/blob/master/docs/src/example.md) Maybe this can be solved by having the user specify what parameters are optimisation variables and which ones are not. The problem that arises with this kind of parameters is that the product of two optimisation variables is unlikely to be convex, but if one of them is fixed then the expression is linear. This kind of information should be available to the rewriting system.

ChrisRackauckas commented 3 years ago

Maybe they could be solved by using Symbolics.jl instead of ModelingToolkit.jl?

ModelingToolkit.jl is built on Symbolics.jl. Anything that Symbolics.jl can represent can go into the ModellingToolkit.jl types and give a modeling frontend to.

How do you make something like this work? This could be the way a user enters a count constraint.

The symbolic system is as confused as I am. What are you trying to do there? Build an equation summing the left and right sides? We probably could specialize mapreduce expressions but I haven't seen anyone want it.

you might have alldifferent(t, σ, ρ, β) as a constraint.

Yeah, we'd have to add that expression.

this interface does not allow yet for integer or Boolean variables

using Symbolics
@variables a::Int b::Bool

At its core, ModelingToolkit seems to be excellent for computing derivatives, but what about functions that do not have a derivative? (At least not in a classical sense…) How does the package work when there is no way to compute a derivative? How strong is the differentiability hypothesis?

It uses the same rule expressions as the AD system, so it will handle ifelse, max, abs, etc. in exactly the same cases in the same expression generating way. Part of what we're doing in AD systems soon is subgradients and absnormal forms, so those same rules would carry over (by the same people too).

I didn't see a lot of ways to add new rules for rewriting expressions in ModelingToolkit.jl, only in Symbolics.jl

Good, because there's no such thing as a ModelingToolkit.jl expression. All expressions are Symbolics.jl expressions.

but rather to get to a form that the targeted solver allows (like second-order cones and linear constraints). Is there a way to write this kind of transformation on top of ModelingToolkit.jl/Symbolics.jl, i.e. without throwing tens of thousands of lines of code at it?

Yeah, this is rather trivial really. It's definitely a good application of E-graphs. I do have a student looking into this, though I think the main focus of GalacticOptim.jl right now is really on nonlinear support since that's the bulk of the use cases.

Also, you mention that GalacticOptim.jl has MOI support, but only for the current, legacy NLP support: the current code does not seem to be able to extract information about the problem, like linear constraints. I bet that such a system would be more complex to write.

islinear and use jacobian. It's no different from other cases where linearity is specialized on. I'd need @blegat's help to actually make that connect to MOI, but getting that kind symbolic information is trivial 2 lines of code stuff.

Not all parameters are optimisation variables

Look at the definition of OptimizationProblem. There are states (optimized for) and parameters (not optimized for).

dourouc05 commented 3 years ago

Thanks for your detailed answer! Regarding the sum syntax, it's really about mimicking what Julia does with native number:

t = 1
σ = 2
ρ = 1
β = 4
[t, σ] .== [ρ, β] # BitVector: [true, false]
sum([t, σ] .== [ρ, β]) # 1: one of the two equalities hold

ChrisRackauckas commented 3 years ago

Yes, that makes sense. Open an issue on Symbolics.jl?

dourouc05 commented 3 years ago

Here it is: https://github.com/JuliaSymbolics/Symbolics.jl/issues/322

jlperla commented 3 years ago

Reposted from https://github.com/jump-dev/MathOptInterface.jl/issues/1538 Modified to add in a point about parameters given https://jump.dev/JuMP.jl/stable/manual/nlp/#Create-a-nonlinear-parameter

@frapac from our discussion:

Modeling languages are great, but for many nonlinear problems you just want to provide an objective and constraints to the solvers with minimal hassles. In economics, in particular, the equations might be impossible to even write in existing modeling langauges. In other cases, maybe you want to use all sorts of Symbolics.jl/ModelingTookit.jl trickery, but in the end have a function which just operates on component arrays.

Basically, you just want to call the optimizers with as native of primitives as possible and without any DSL getting in the way. This is in the same spirit as https://github.com/jump-dev/MatrixOptInterface.jl I believe.

In its current form, JuMP for the NLP is very scalar centric, and this was one of the things that led to the creation of https://github.com/SciML/GalacticOptim.jl (note, however, that the hope was that the scope of GalacticOptim will eventually expand into far more interesting things largely around applying problem reduction as well as other fancy AD things).

On the MOI side, there might not be that many things missing for a more direct pass-though. Probably need dense jacobians and hessians, and I think that anything that needs to allocate will be a problem if it wants to do passthrough to GPU-style algorithms, and it might make passthrough of things like componentarrays difficult. But that stuff could come later.

What might this look like to a user?

First, it is useful to have an object to collect derivatives and make different AD convenient. I will describe the ones from GalacticOptim, but the gradient wrappers are something that could be standardized and shared in SciMLBase or just made specific to JuMP/MOI. For example, https://github.com/SciML/GalacticOptim.jl/tree/master/src/function provides some wrappers for all sorts of AD. It could do something like the following.

Write your julia function. SciML allows allows passing in a parameter vector p which would be nice, but could be done with just a closure if that doesn't fit into JuMP very well.

rosenbrock(x, p) =  (p[1] - x[1])^2 + p[2] * (x[2] - x[1]^2)^2
f = OptimizationFunctions(rosenbrock)

Which would dispatch to the default auto-differentiation. Or to use zygote or finite differences

OptimizationFunction(rosenbrock, GalacticOptim.AutoZygote())
OptimizationFunction(rosenbrock, GalacticOptim.AutoFiniteDiff())

Or to pass in your own derviatives

rosenbrock_grad(x, p) =  2 p[1] * x[1]+  # WHATEVER IT IS
OptimizationFunction(rosenbrock, rosenbrock_grad)

Finally, if the problem was constrained, then in the current GalacticOptim (which could be changed) it can generate the jacobian and hessians as required. The current design has an optional cons but I think it would be much nicer if these were decoupled. Anyways, not essential either way.

rosenbrock(x, p) =  (p[1] - x[1])^2 + p[2] * (x[2] - x[1]^2)^2
function con2_c(x,p)
    [x[1]^2 + x[2]^2,
     x[2]*sin(x[1])-x[1]]
end
OptimizationFunction(rosenbrock, GalacticOptim.AutoForwardDiff();cons= con2_c)

First consider if there were MOI exnteions instead. @frapac suggested something like

model = MOI.nlpsolve(f, g!, h!,p, x_lb, x_ub, c_lb, c_ub)
attach_linear_constraints!(model, A, b)
attach_quadratic_constraints!(model, Q, q, d)

which I would hope could have a "differentiation wrapper" version:

p = # the parameters
fun = OptimizationFunction(rosenbrock, GalacticOptim.AutoForwardDiff();cons= con2_c)
model = MOI.nlpsolve(fun, p, x_lb, x_ub, c_lb, c_ub)
attach_linear_constraints!(model, A, b)
attach_quadratic_constraints!(model, Q, q, d)

Note that I added in parameters in the vector p to be in the same spriit as parametesr in Jump

In fact, if this existed, then I think GalacticOptim could be extremly short and almost not exist. Or it could just be the OptimizationFunctions wrapper. But I would hope that we can still do all of the linear constraints, complementarity constraints, etc.

The last piece of GalacticOptim that might be worth considering is its role in auto-differentiation rules for the optimizeres themselves. We want to be able to establish AD rules for optimizers themselves (e.g. https://en.wikipedia.org/wiki/Envelope_theorem) but that might be better discussed down the line.

To be rrule friendly, we would probably want to avoid the mutatino of adding constraints ex-post, and instead have them in the constructor of the problem type. Mutation makes for difficult rules.

From the "DSL" perspective, I think that JuMP might actually work with a few constrained on supported features. Although I think I would preefer the MOI-extension instead.

In particular, if you want to provide a "vectorized" objective (and optionally nonlinear constraint) then:

Only a single vectorized, variable is support.
Only a single nonlinear constraint is supported.
The user is responsible for providing the gradients/hessians. Maybe wrappers are provided to help, but this is left flexible.

Otherwise, I think that the existing JuMP and MOI stuff around linear constraints would be perfectly fine. The user would only have a single variables to work with, but I think that is fine.

Alternatively, if you wanted this to live in JuMP then, to see possible usage in a one-variable limited jump, let me adapt https://jump.dev/JuMP.jl/stable/tutorials/Nonlinear%20programs/rosenbrock/

using JuMP
import Ipopt
import Test
import AutoDiffWrappersFromWherever

function example_rosenbrock()
    model = Model(Ipopt.Optimizer)
    set_silent(model)

    # call wrappers discussed above
    rosenbrock(x) = (1 - x[1])^2 + 100 * (x[2] - x[1]^2)^2

    obj = OptimizationFunctions(rosenbrock, AutoFiniteDiff())

    @variable(model, x[1:2])  # or @passthrough_variable(model, x[1,2])
    @NLvector_objective(model, Min, obj)
    # ERRORS IF MODEL HAS MORE THAN ONE VARIABLE OR IF SCALAR

    optimize!(model)

    Test.@test termination_status(model) == MOI.LOCALLY_SOLVED
    Test.@test primal_status(model) == MOI.FEASIBLE_POINT
    Test.@test objective_value(model) ≈ 0.0 atol = 1e-10
    Test.@test value(x) ≈ 1.0
    Test.@test value(y) ≈ 1.0
    return
end

function example_constrained_rosenbrock()
    model = Model(Ipopt.Optimizer)
    set_silent(model)

    # call wrappers discussed above
    rosenbrock(x) = (1 - x[1])^2 + 100 * (x[2] - x[1]^2)^2
    function con2_c(x)
        [x[1]^2 + x[2]^2,
        x[2]*sin(x[1])-x[1]]
    end
    obj = OptimizationFunctions(rosenbrock, AutoFiniteDiff(), cons = con2_c)

    @variable(model, x[1:2])
    @NLvector_objective(model, Min, obj)

    # THIS ADDS THE CONSTRAINT AS WELL.  HESSIAN OF LAGRANGIAN ATTACHED

    optimize!(model)

    Test.@test termination_status(model) == MOI.LOCALLY_SOLVED
    Test.@test primal_status(model) == MOI.FEASIBLE_POINT
    Test.@test objective_value(model) ≈ 0.0 atol = 1e-10
    Test.@test value(x) ≈ 1.0
    Test.@test value(y) ≈ 1.0
    return
end

What would @NLvector_objective do? Basically just register the objective and potentially the constraints in their most raw form when providing the implementation for MOI calls. It can access the OptimizationFunctions structure for the function calls and just call them directly.

odow commented 1 year ago

ScalarNonlinearFunction is merged. I'll hold off closing this until we add VectorNonlinearFunction.

odow commented 1 year ago

Closing as done. We know have ScalarNonlinearFunction and VectorNonlinearFunction.

jump-dev / MathOptInterface.jl

Next-generation nonlinear #846

1. Counting

2. Global constraints for optimisation

3. Lack of differentiability for some functions

4. Fancy rule-based rewriting

5. Not all parameters are optimisation variables