jump-dev / MathOptInterface.jl

A data structure for mathematical optimization problems
http://jump.dev/MathOptInterface.jl/
Other
394 stars 87 forks source link

Array input arguments in nonlinear expressions #2402

Open odow opened 9 months ago

odow commented 9 months ago

Context: Yesterday I had a chat to @rluce about nonlinear expressions, particularly as they relate to Gurobi's upcoming nonlinear interface. We broadly agree on scalar nonlinear functions, and he had some preliminary ideas for vector-inputs.

cc @ccoffrin and @chriscoey for thoughts.

Our ultimate goal is to support examples like the following:

using JuMP, Ipopt
model = Model(Ipopt.Optimizer)
@variable(model, x[1:2, 1:2])
@objective(model, Max, log(det(X)))
# or perhaps the easier to manage;
@objective(model, Max, log_det(X))
using JuMP, Ipopt
model = Model(Ipopt.Optimizer)
@variable(model, x[1:3] >= 1)
p = 2
@objective(model, Min, norm(x, p))

Another key consumer of this would be https://github.com/jump-dev/MiniZinc.jl.

Changes required in JuMP

Changes required in MOI

  1. We'd need to support Array in MOI.(Scalar,Vector)NonlinearFunction. This seems pretty easy.
  2. We'd need to support Array in MOI.Nonlinear and be able to compute derivatives, etc.

The tricky part is all in 2.

We'd likely need some sort of Node(NODE_VECTOR, parent, n).

But matrices are a bit more complicated. We'd need to encode (rows, cols). One option would be to store the size as a packed value::Int64. That'd mean that we couldn't have matrices with side dimension greater than typemax(Int32)... but that seems okay.

encode(m::Int64, n::Int64) = encode(Int32(m), Int32(n))
encode(m::Int32, n::Int32) = reinterpret(Int64, (m, n))
decode(x::Int64) = Int64.(reinterpret(Tuple{Int32,Int32}, x))

norm(x) would look something like:

expr = Expression(
    [
        Node(NODE_CALL_MULTIVARIATE, -1, OP_NORM             ),
        Node(NODE_ARRAY,              1, 3 #= encode(3, 0) =#),
        Node(NODE_VARIABLE,           2, 1 #= x1 =#          ),
        Node(NODE_VARIABLE,           2, 2 #= x2 =#          ),
        Node(NODE_VARIABLE,           2, 3 #= x3 =#          ),
    ],
    [],
);

log_det(X) would then look something like:

expr = Expression(
    [
        Node(NODE_CALL_MULTIVARIATE, -1, OP_LOGDET                     ),
        Node(NODE_ARRAY,              1, 8589934594  #= encode(2, 2) =#),
        Node(NODE_VARIABLE,           2, 1 #= x11 =#                   ),
        Node(NODE_VARIABLE,           2, 2 #= x21 =#                   ),
        Node(NODE_VARIABLE,           2, 2 #= x12 =#                   ),
        Node(NODE_VARIABLE,           2, 3 #= x22 =#                   ),
    ],
    [],
);

Once you have the data structure, it seems to me that the AD should follow fairly well.

Output arguments

Still absolutely no idea.

odow commented 9 months ago

For vector-outputs, we could potentially do something like this:

A = [1 2; 3 4]
x = [MOI.VariableIndex(1), MOI.VariableIndex(2)]

f(x) = sum(A * x)

|  i | node_type    | parent | value  | 
|  1 | SCALAR_CALL  |     -1 | OP_SUM |
|  2 | RESULT_ARRAY |      1 | (2, 0) |
|  3 | RESULT       |      2 | NaN    |
|  4 | RESULT       |      2 | NaN    |
|  5 | VECTOR_CALL  |      2 | OP_MUL |
|  6 | ARRAY        |      5 | (2, 2) |
|  7 | CONSTANT     |      6 | 1      |
|  8 | CONSTANT     |      6 | 2      |
|  9 | CONSTANT     |      6 | 3      |
| 10 | CONSTANT     |      6 | 4      |
| 11 | ARRAY        |      5 | (2, 0) |
| 12 | VARIABLE     |     11 | 1      |
| 13 | VARIABLE     |     11 | 2      |

[1.0, 3.0, 2.0, 4.0]

@rluce, we could have dropped the .values list if you re-interpret the .value field of a NODE_VALUE to be float64 instead of an int64 index into the array of float64...

That'd simplify things to:

|  i | node_type    | parent | value  | 
|  1 | CALL         |     -1 | OP_SUM |
|  2 | RESULT_ARRAY |      1 | (2, 0) |
|  3 | RESULT       |      2 | NaN    |
|  4 | RESULT       |      2 | NaN    |
|  5 | CALL         |      2 | OP_MUL |
|  6 | ARRAY        |      5 | (2, 2) |
|  7 | CONSTANT     |      6 | 1.0    |
|  8 | CONSTANT     |      6 | 2.0    |
|  9 | CONSTANT     |      6 | 3.0    |
| 10 | CONSTANT     |      6 | 4.0    |
| 11 | ARRAY        |      5 | (2, 0) |
| 12 | VARIABLE     |     11 | 1      |
| 13 | VARIABLE     |     11 | 2      |
mlubin commented 9 months ago

Gurobi will support array-valued nonlinear expressions and also needs JuMP to compute derivatives? Not sure I understand what the interface is.

odow commented 9 months ago

Gurobi will support array-valued nonlinear expressions and also needs JuMP to compute derivatives?

No. They're going to do their own thing, and won't support arrays to start with.

But their interface is going to look very similar to the MOI data structure so we were just riffing on some ideas.

It'll look something like

GRBaddnonlinearexpression(
    grb,
    result_variable::Int,
    n_nodes::Int,
    n_data::Int,
    node_types::Vector{Int},
    node_parents::Vector{Int},
    node_values::Vector{Int},
    data::Vector{Float64},
)
rluce commented 9 months ago

Correct, no vector in- our output planned for Gurobi, but the concept is very tempting 😄 .

But matrices are a bit more complicated. We'd need to encode (rows, cols). One option would be to store the size as a packed value::Int64

Another option could be to put dimensionality information in the value array, viz.

A = [1 2; 3 4]
x = [MOI.VariableIndex(1), MOI.VariableIndex(2)]

f(x) = sum(A * x)

|  i | node_type    | parent | value  | 
|  1 | SCALAR_CALL  |     -1 | OP_SUM |
|  2 | RESULT_ARRAY |      1 | 9      |
|  3 | RESULT       |      2 | NaN    |
|  4 | RESULT       |      2 | NaN    |
|  5 | VECTOR_CALL  |      2 | OP_MUL |
|  6 | ARRAY        |      5 | 7      |
|  7 | CONSTANT     |      6 | 1      |
|  8 | CONSTANT     |      6 | 2      |
|  9 | CONSTANT     |      6 | 3      |
| 10 | CONSTANT     |      6 | 4      |
| 11 | ARRAY        |      5 | 5      |
| 12 | VARIABLE     |     11 | 1      |
| 13 | VARIABLE     |     11 | 2      |

[1.0, 3.0, 2.0, 4.0, 2.0, 0.0, 2.0, 2.0, 2.0, 0.0]

with the understanding that an ARRAY node will always read 2 consecutive values from the value array. Not pretty, but maybe more consistent with the idea that value is the place to store static data.

odow commented 9 months ago

I guess there are multiple options for the size. It probably doesn't really matter which one you go with, so long as it is consistent.

The other option, instead of directly coding RESULT values, is to have a OP_GET_INDEX node which can lookup into the array.

|  i | OP_GET_INDEX       |      2 | 1  |

The benefit of this is that you don't need all the RESULT rows. but the downside is that on the reverse pass you'd need to allocate storage for the array ops, where as the RESULT rows could be filled in-place like normal.