Compatibility layer with SymbolicRegression.jl

jump-dev / JuMP.jl

Modeling language for Mathematical Optimization (linear, mixed-integer, conic, semidefinite, nonlinear)

http://jump.dev/JuMP.jl/

Other

2.21k stars 393 forks source link

Compatibility layer with SymbolicRegression.jl #3790

Closed MilesCranmer closed 3 weeks ago

MilesCranmer commented 1 month ago

I would like to explore using JuMP.jl to set up mixed-integer programming optimization problems in SymbolicRegression.jl and PySR. This would open up an entirely new area of optimization problems, because you could potentially evolve the symbolic objective itself without needing to re-compile. For example, this means we could do things like adding constraints that constants in a discovered symbolic expressions within PySR are equal to integers.

To do this, I think what we basically need to do is set up some kind of compatibility layer between DynamicExpressions.jl and MathOptInterface.

Describe the solution you'd like

I'm not sure where to even start. Perhaps I can explain a bit about the DynamicExpressions.jl API.

DynamicExpressions API

Basically there is this Node type that stores references to operators in a lightweight struct:

mutable struct Node{T} <: AbstractExpressionNode{T}
    degree::UInt8  # 0 for constant/variable, 1 for cos/sin, 2 for +/* etc.
    constant::Bool  # false if variable
    val::T  # If is a constant, this stores the actual value
    feature::UInt16  # If is a variable (e.g., x in cos(x)), this stores the feature index.
    op::UInt8  # If operator, this is the index of the operator in operators.binops, or operators.unaops
    l::Node{T}  # Left child node. Only defined for degree=1 or degree=2.
    r::Node{T}  # Right child node. Only defined for degree=2. 
end

(It's a binary tree now, but there is a working implementation which generalises things to n-arity operators)

The reason it is written like this is so that the expression can be completely runtime-generated, type stable, and memory-efficient. There is no extra compilation needed besides the kernels defined for the user's required operators.

To use this, we define a space of operators and variable names:

operators = OperatorEnum(;
    binary_operators=(+, -, *, /), unary_operators=(sin, cos, exp)
) # ::OperatorEnum{Tuple{typeof(+),typeof(-),...
variable_names = ["x", "y"]

We can then create a variable (indicating the first column of some dataset)

x = Node{Float64}(; feature=1)

We can build up more complex expressions using these basic building blocks:

y = Node{Float64}(; feature=2)
c = Node{Float64}(; val=2.0)

complex_node = Node(; op=3, l=x, r=Node(; op=1, l=y, r=c))  # x * (y + c)

where the 3 indicates * and 1 indicates + – referencing the OperatorEnum we created.

This can get turned into an Expression object that stores both the OperatorEnum and the raw Node type which references the operators:

struct Expression{T,N<:AbstractExpressionNode{T},D<:NamedTuple} <: AbstractExpression{T,N}
    tree::N
    metadata::Metadata{D}
end

For example,

complex_expr = Expression(complex_node; operators, variable_names)

These are nice since, with the new feature here, you can use regular Julia math on them, and they will lookup the right operator in the enum for you and generate the corresponding Node:

complex_expr_div_2 = complex_expr / 2.0

This expression includes its own metadata about the space of operators, so you can print this expression, or evaluate it directly:

complex_expr(randn(2, 5))  # Batched along second axis; output is shape (5,)
# ^Fast, and zero-compilation

This also allows for gradients with respect to features of the data:

using Zygote

complex_expr'(X)

Or the constants of the expression:

complex_expr'(X; variable=Val(false))

again, which is a single evaluation kernel for any expression; no extra compilation needed.

Interface

Now, the interface. So, expressions have this general tree_mapreduce operation which is used to implement basically everything. It's like a mapreduce except the reduction is structured like (parent, children...). It also has built-in support for graphs that have shared subexpressions via the GraphNode type – used for avoiding double-counting as well as for caching.

One of the main utilities for optimizing expressions is get_scalar_constants and set_scalar_constants!, which are implemented like so:

constants, references = get_scalar_constants(expression)
new_constants = constants .* 2
set_scalar_constants!(expression, new_constants, references)

My guess is that this interface of getting and setting constants somehow is probably the right interface for connecting DynamicExpressions to MathOptInterface... What do you think?

The other thing (as mentioned above) is

expression'(data; variable=Val(false))

which gets the gradients using Zygote-generated kernels for each operator. This also has an interface with ChainRulesCore.jl so should work if you differentiate with respect to the expression object.

(The downside of this is that it is forward-mode, although:

For the type of small expressions typically encountered in symbolic expressions, it is not yet hitting the regime where reverse-mode brings substantial improvements (although SymbolicRegression.jl will soon support a fast and type-stable scalar+vector+matrix expression which I guess will change this...)
Enzyme reverse mode is compatible with DynamicExpressions.jl, so could be an option, and
One of my students is actively working on a custom-built reverse-mode for DynamicExpressions.jl, which could be another option.)

So... What is the right way to optimize DynamicExpressions.jl expression objects with JuMP?

Expected question: Why not post this issue in MathOptInterface?

I think for a specific implementation case we could probably have an issue there as well, or maybe in DynamicExpressions.jl if that makes more sense. I set up an issue here for tracking the overall goal of having a compatibility layer between JuMP and SymbolicRegression.jl, regardless of how that takes shape.

odow commented 1 month ago

It's not obvious to me how you intend the end-user to combine JuMP and SymbolicRegression.

Assuming we can magic any internals, what is your ideal API? Can you provide a small hypothetical example of what you would like to build?

MilesCranmer commented 1 month ago

Great questions! I'm honestly not even sure myself. It will probably evolve as I figure out what sorts of things are possible. (This is effectively an unexplored area of research, after all!)

For an example, let's take constraining constants in an expression to be integers. The current way a user would do this is as follows:

using SymbolicRegression

function my_loss(expr, dataset::Dataset{T,L}, options::Options) where {T,L}

    y_pred, completed = eval_tree_array(expr, dataset.X, options)
    !completed && return L(Inf)  # Bad expression (e.g., 1/0)

    y = dataset.y
    loss = sum(i -> abs2(y[i] - y_pred[i]), eachindex(y, y_pred)) / length(y)

    # Now, add soft integer constraint:
    integral_penalty = sum(expr) do node  # Sum over nodes of expression
        if node.degree == 0 && node.constant
            val = node.val
            abs(x - round(x))  # Distance from nearest integer
        else
            zero(L)
        end
    end

    return loss + 100 * integral_penalty
end

the user passes this to an MLJ-style regressor class with a bunch of other available options:

using MLJ, Optim

model = SRRegressor(
    loss_function=my_loss,
    unary_operators=(cos, exp),
    binary_operators=(+, -, *, /),
    optimizer_algorithm=Optim.BFGS(),
)
mach = machine(model, X, y)
fit!(mach)

However, this user-provided loss function is basically just a hacky way of penalizing non-integral constants and hoping that the genetic algorithm + BFGS takes care of them (it cycles through many rounds of genetic operations, like swapping nodes or perturbing constants, followed by explicit BFGS optimization). i.e., nothing like proper MINLP!

So what would be nice is if there was a way to declare a constraint on the constants and use a MINLP algorithm interfaced by JuMP/MOI to properly do this sort of thing – maybe just where Optim.BFGS would normally get passed. (I'm not sure if it would simply go where optimizer_algorithm would go, or if the user would declare a JuMP objective within the loss function and pass it to the SRRegressor.)

P.S., here are some related requests from PySR users:

(and many many more related ones in that discussions page – which handles both the Julia and Python side)

odow commented 1 month ago

I still don't really understand where JuMP would fit into your example. Would the user write a JuMP model? Or is this some backend feature and you really just want access to a MINLP solver?

Not to dissuade you, but see: https://jump.dev/JuMP.jl/stable/should_i_use/#You-want-to-optimize-a-complicated-Julia-function

It's not obvious that JuMP is a good choice here.

MilesCranmer commented 1 month ago

Good questions and please don't hesitate to dissuade, I know next to nothing about MINLP.

Would the user write a JuMP model?

I think this would be one possibility. The user would declare a JuMP model, and there would be an additional feature available that allows the user to specify a functional form (a DynamicExpressions.jl expression). Sort of like (modifying the quickstart example)

julia> @variable(model, x >= 0)
x

julia> @variable(model, 0 <= y <= 3)
y

julia> @function(model, f, inputs=1, internal_constants=false)
f

julia> @objective(model, Min, abs(f(x) - y) + abs(f(2 * x) - 5) + 0.1 * complexity(f))
abs(f(x) - y) + abs(f(2 * x) - 5) + 0.1 * complexity(f)

julia> @constraint(model, c1, 6x + 8y >= 100)
c1 : 6 x + 8 y ≥ 100

julia> @constraint(model, c2, 7x + 12y >= 120)
c2 : 7 x + 12 y ≥ 120

SymbolicRegression.jl would evolve that @function object as part of the optimization, which would be combined with a MINLP solver, leading to a symbolic form f as well as some scalar constants x and y. (This internal_constants=false would disable constant optimization within SymbolicExpression.jl – it would exclusively search for symbolic forms that involve variables, but no constants. For example, you could have f(x) = x, f(x) = x * x, f(x) = exp(x), and so on.).

The complexity(f) would eliminate the degeneracy from the infinite space of functions. (This only takes on discrete values, so would be totally tractable inside a MINLP scheme – you would run MINLP at each complexity from 1 to max complexity... Anyways, I could explain the details later, I don't want to distract from the main ideas)

You could set up the optimizers like the following:

ipopt = optimizer_with_attributes(Ipopt.Optimizer)
highs = optimizer_with_attributes(HiGHS.Optimizer)
juniper = optimizer_with_attributes(
    Juniper.Optimizer,
    "nl_solver" => ipopt,
    "mip_solver" => highs,
)
optimizer = optimizer_with_attributes(
    SymbolicRegressionJuMP.Optimizer,
    "minlp_solver" => juniper,
    "options" => SymbolicRegression.Options(unary_operators=(cos, exp), binary_operators=(+, -, *), maxsize=25)
)

The maxsize here constraints the maximum complexity of the expression.

However, this would obviously a massive project... So, in the meantime, maybe:

Or is this some backend feature and you really just want access to a MINLP solver?

This is also something I am interested in regardless of (1). It's probably much easier too. I've looked a bit and have found a lot of general NLP solvers, but I am specifically looking for a MINLP for blackbox functions (which DO have gradients, as well as Hessians). DynamicExpressions.jl can also provide info on convexity if needed. But the expressions are dynamic and might be linear or nonlinear – and the optimizer would not be able to know the exact expression (since then it would likely have a huge overhead, since the expression is runtime-generated).

odow commented 1 month ago

Let me see if I can summarize what you'd like to do, and you tell me which bits I get wrong. I have only a vague understanding of how SymbolicRegression works.

I have some input data x and scalar output targets y.
I want to learn the symbolic representation of a function f(x) = y that minimizes some loss function l(x, y). Typically, l(x, y) = || f(x) - y ||_2 but it might also have some extra terms.
f(x) is really f(x, p) where there is a structural form f and some parameters p.
Given a particular structural form f, you currently use an optimizer (like BFGS) to minimize_p l(x, y, p)
You then use derivative-free optimization (local search, GAs, etc) to iterate over different functional forms of f.
In step 4, you'd like to be able to use a different optimizer, particularly one that lets you add constraints on p.
The main constraints that you'd like to add are integrality restrictions on all of p.
You also want to write constraints like "if there exists x^p then p <= 2" or "if x^{1 // p} then p is odd".

I assume you want to do something like this:

function optimize_expression_in_step_4(expression, dataset, options, optimizer)
    _, references = get_scalar_constants(expression)
    model = Model(optimizer)
    @variable(model, p[1:length(references)])
    @variable(model, residuals[1:length(dataset)])
    set_scalar_constants!(expression, p, references)
    y, _ = eval_tree_array(expression, dataset.X, options)
    @constraint(model, residuals .== y .- dataset.y)
    @objective(model, Min, sum(residuals.^2))
    # START-USER-CONSTRAINTS
    set_integer.(p)
    # END-USER-CONSTRAINTS
    optimize!(model)
    set_scalar_constants!(expression, value.(p), references)
    return objective_value(model)
end

To get it working, you need to be able to evaluate expression with JuMP.VariableRef as the parameters.

I have no idea how you can get the user to write constraint like "if x^{1 // p} then p is odd" in a generic way, but this seems like an issue that is bigger than the JuMP interface. They'd need to be able to iterate over references?

I don't think we need to do anything special in JuMP.

MilesCranmer commented 1 month ago

Thanks! Yes, your description of 1-8 is what I had in mind. There are only a couple possible differences.

One is that the optimization is typically decoupled from the user-specified loss function, in order for the genetic algorithm itself to be aware of any constraints – it can optimize things even if BFGS (or an MINLP solver) is disabled. So if the objective_value(model) has a way of returning soft penalties for violated constraints, that would be ideal. (The optimize! call itself would need to live somewhere else which seems doable from my end.)

The second is about how this works for the expression evaluation. Does JuMP.VariableRef record what operations are applied? Basically I can't quite figure out how, in your example, the model would know how to update the result of y when the parameters change. (For example, with Optim, I provide a function that calls eval_tree_array on the new parameters for each new trial)

MilesCranmer commented 1 month ago

It seems like it works, but it's really slow. DynamicExpressions.jl requires that the element type of the tree and the data is the same – for performance and preallocation reasons. So for this to work I had to type everything as an AbstractJuMPScalar.

Also, it seems like JuMP records the entire expression tree (?), which is in opposition to the use of DynamicExpressions which is designed around fast runtime-generated expressions. Needing to walk the expression tree dynamically with other data structures will likely use a lot of memory. Is there any way I can force JuMP to be content with eval_tree_array as some unknown nonlinear operation that has gradients available? Or is JuMP only compatible with a certain set of operators? For example, perhaps there is a way I can create a new EvalTreeExpr <: NonlinearExpr for this to work?

julia> using DynamicExpressions

julia> using JuMP

julia> using Ipopt: Ipopt

julia> x1 = Node{Float64}(; feature=1)
x1

julia> x2 = Node{Float64}(; feature=2)
x2

julia> operators = OperatorEnum(
           binary_operators=(+, -, *, /),
           unary_operators=(sin, cos),
       )
OperatorEnum{Tuple{typeof(+), typeof(-), typeof(*), typeof(/)}, Tuple{typeof(sin), typeof(cos)}}((+, -, *, /), (sin, cos))

julia> tree = cos(x1 * 2.1) - x2
cos(x1 * 2.1) - x2

julia> p0, references = get_scalar_constants(tree)
([2.1], Base.RefValue{Node{Float64}}[Base.RefValue{Node{Float64}}(2.1)])

julia> num_variables = length(p0)
1

julia> optimizer = Ipopt.Optimizer
Ipopt.Optimizer

julia> model = Model(optimizer)
A JuMP Model
Feasibility problem with:
Variables: 0
Model mode: AUTOMATIC
CachingOptimizer state: EMPTY_OPTIMIZER
Solver name: Ipopt

julia> @variable(model, p[1:num_variables])
1-element Vector{VariableRef}:
 p[1]

julia> function convert_to_variable_ref(tree, params, ::Type{R}) where {R}
           i = Ref(0)
           tree_mapreduce(
               leaf -> if leaf.constant
                   Node{R}(; val=params[i[] += 1])
               else
                   Node{R}(; feature=leaf.feature)
               end,
               identity,
               (parent, children...) -> Node{R}(; parent.op, children),
               tree,
           )
       end
convert_to_variable_ref (generic function with 1 methods)

julia> tree_ref = convert_to_variable_ref(tree, p, AbstractJuMPScalar)
cos(x1 * (p[1])) - x2

julia> X = randn(Float64, 2, 32)
2×32 Matrix{Float64}:
 -0.0746526   0.364179  -0.31086    -0.27601    -0.111302  -0.655292   0.79584    0.849459  -1.58574   -0.132552  -0.999372  …   0.815621  -1.0162    0.877786   1.82748   -0.977439   0.25327   -1.5235    0.151345  -0.134643  1.6589   0.825483
  1.79968    -0.725315   0.0160492  -0.0165158  -1.52329   -0.179642  -0.434624  -1.00659    0.875373  -0.750719  -1.78184      -1.09908    0.253982  0.822796  -0.620156  -1.76847   -0.507072  -1.18368  -0.522273   0.922352  2.16542  0.109187

julia> @variable(model, converter)
converter

julia> @constraint(model, converter == 1)
converter = 1

julia> X_ref = Matrix{AbstractJuMPScalar}(converter .* X)
2×32 Matrix{AbstractJuMPScalar}:
 -0.07465259226587102 converter  0.36417901289093735 converter  -0.31085990947789494 converter  …  0.15134495266926729 converter  -0.1346426329962789 converter  1.6589021930524135 converter  0.8254827460019445 converter
 1.799683617390219 converter     -0.7253151845805736 converter  0.016049244525629433 converter     -0.5222730762965133 converter  0.9223524172130876 converter   2.1654162971520536 converter  0.10918691593060266 converter

julia> eval_tree_array(tree_ref, X_ref, operators)
(AbstractJuMPScalar[cos(-0.07465259226587102 converter*p[1]) - (1.799683617390219 converter), cos(0.36417901289093735 converter*p[1]) - (-0.7253151845805736 converter), cos(-0.31085990947789494 converter*p[1]) - (0.016049244525629433 converter), cos(-0.27600982416010994 converter*p[1]) - (-0.01651584685659015 converter), cos(-0.11130200072339379 converter*p[1]) - (-1.5232920468979532 converter), cos(-0.6552919255545688 converter*p[1]) - (-0.1796421756636822 converter), cos(0.795840390388174 converter*p[1]) - (-0.4346236223235513 converter), cos(0.8494586131074104 converter*p[1]) - (-1.0065930799112859 converter), cos(-1.585735689597626 converter*p[1]) - (0.8753733720730349 converter), cos(-0.13255195039681938 converter*p[1]) - (-0.7507187114742331 converter)  …  cos(-1.0162002402114096 converter*p[1]) - (0.2539823566729166 converter), cos(0.8777861446417738 converter*p[1]) - (0.822795814625541 converter), cos(1.827476863381431 converter*p[1]) - (-0.6201558037364908 converter), cos(-0.9774388479783699 converter*p[1]) - (-1.768471457571797 converter), cos(0.25327022257524606 converter*p[1]) - (-0.5070717696587647 converter), cos(-1.5234961669454194 converter*p[1]) - (-1.183675456014598 converter), cos(0.15134495266926729 converter*p[1]) - (-0.5222730762965133 converter), cos(-0.1346426329962789 converter*p[1]) - (0.9223524172130876 converter), cos(1.6589021930524135 converter*p[1]) - (2.1654162971520536 converter), cos(0.8254827460019445 converter*p[1]) - (0.10918691593060266 converter)], true)

Perhaps I need to have some type of new type that simply records the application of eval_tree_array?

odow commented 1 month ago

It's a little tricky to reply to each of your points, but here's an attempt

So if the objective_value(model) has a way of returning soft penalties for violated constraints

JuMP doesn't return solutions that violate constraints. If you want soft constraints, these need to be modeled explicitly.

You could do:

    if !is_solved_and_feasible(model)
        return Inf
    end
    return objective_value(model)

Does JuMP.VariableRef record what operations are applied?

We implement operator overloading that returns new expression trees. sin(x) returns a JuMP.NonlinearExpr with NonlinearExpr(:sin, Any[x]).

the model would know how to update the result of y when the parameters change.

I'm assuming here that

    set_scalar_constants!(expression, p, references)
    y, _ = eval_tree_array(expression, dataset.X, options)

has set the JuMP variables in the tree, so y is a nonlinear expression of JuMP variables. It doesn't operate on the actual numeric values.

It seems like it works, but it's really slow. DynamicExpressions.jl requires that the element type of the tree and the data is the same – for performance and preallocation reasons. So for this to work I had to type everything as an AbstractJuMPScalar. Also, it seems like JuMP records the entire expression tree (?),

Yes, your code snippet is pretty much what I expected to happen.

You could perhaps do X_ref = Matrix{AbstractJuMPScalar}(AffExpr.(X)) to get rid of converter.

Needing to walk the expression tree dynamically with other data structures will likely use a lot of memory.

Not as much as actually solving the MINLP :smile:

Is there any way I can force JuMP to be content with eval_tree_array as some unknown nonlinear operation that has gradients available? Or is JuMP only compatible with a certain set of operators?

So this depends on what MINLP solver you want to use.

In cases like Apline, BARON, and Couenne, they require the expression graph, and do not support arbitrary user-defined functions (because they use a priori knowledge of convexity, monotonicity, etc for spatial branching).

If you use Juniper or KNITRO, you will get only a local solution to the MINLP, but they can support user-defined functions.

You could do something like:

function optimize_expression_in_step_4(expression, dataset, options, optimizer)
    _, references = get_scalar_constants(expression)
    model = Model(optimizer)
    M, N = length(dataset), length(references)
    @variable(model, p[1:N])
    @variable(model, residuals[1:N])
    last_x = nothing
    function eval_fn(i::Int, x::T...) where {T<:Real}
        if last_x !== x
            set_scalar_constants!(expression, collect(x), references)
            last_x = x
        end
        y, _ = eval_tree_array(expression, dataset.X[i], options)
        return y
    end
    ops = map(1:M) do i
        return add_nonlinear_operator(
            model, 
            N, 
            (x...) -> eval_fn(i, x...),
            # Will use ForwardDiff for gradient automatically, otherwise provide
            # (g::AbstractVector{Float64}, x::Float64...) -> nothing
            # which fills in g` with the gradient.
            name = Symbol("op_$i"),
        )
    end
    @constraint(model, [i in 1:N], residuals[i] == ops[i](p...) - dataset.y[i])
    @objective(model, Min, sum(residuals.^2))
    # START-USER-CONSTRAINTS
    set_integer.(p)
    # END-USER-CONSTRAINTS
    optimize!(model)
    set_scalar_constants!(expression, value.(p), references)
    return objective_value(model)
end

but I don't think it will work very well if you have large M or N.

Note that I'm still not convinced JuMP is a good fit for this.

If I had to summarize, it seems like SymbolicRegression is a local-search based algorithm that is made to evaluate many candidates fast and efficiently with minimal overhead.

JuMP (for MINLPs at least) is the opposite: given a lot of structure, find the best possible answer.

Why can't you just take the output from BFGS and round the outputs? Or do local-search rounding some values up and some values down?

MilesCranmer commented 1 month ago

Thanks, add_nonlinear_operator looks very useful. Can it handle vector outputs, or is it only scalar output? eval_tree_array is batched over a dataset and much of the performance gains have been from using SIMD loops over multiple rows of data.

Indeed even a local search would be useful, so Juniper or KNITRO sound like potential options. The outer loop of SymbolicRegression would effectively turn this into a global search.

If I had to summarize, it seems like SymbolicRegression is a local-search based algorithm that is made to evaluate many candidates fast and efficiently with minimal overhead.

It’s a more subtle than this, it’s hard to put it in either camp. Internally SR.jl wraps different “genetic operators” / “mutations” that modify a symbolic expression. It randomly applies these operators to expressions according to some categorical distribution. (It also has some population and multi-population evolutionary algorithms.) But those mutations can be simple perturbations to the expressions, or local search algorithms like BFGS, or even global search algorithms. The user controls the frequency at which a mutation occurs, so there isn’t really a strict need for a mutation to be a given speed. For example, one of the default mutations is a full BFGS optimization run.

Evolutionary algorithms are non-convergent so SR.jl just runs for the user-specified number of iterations or exit criteria.

What I am imagining is that the user would specify a JuMP model, and there could be a way to quickly evaluate whether the constraints are violated or not, which could be used in the frequently-called loss function. Is it possible to do this check in JuMP, for some user-specified initial parameters? Or would they need to do an explicit optimize! call to have that is_solved_and_feasible available?

Then, should the user want to use some integral constraints, one of the mutations would run an explicit MINLP solver. This would be called much less frequently so wouldn’t need to be quite as fast.

Rounding after BFGS

This is one approach for this type of thing. But since it’s not a real MINLP solver it doesn’t get the correct solutions, and we instead have to rely on the genetic algorithm randomly perturbing the constants until it gets lucky.

Another hack that can work for simple problems is to provide integers as features to the model. But this relies on the genetic algorithm randomly finding the right combination of features — which is very inefficient. It would be much better to have an explicit solver for this type of thing.

odow commented 1 month ago

Can it handle vector outputs, or is it only scalar output?

No, scalar output only.

Is it possible to do this check in JuMP, for some user-specified initial parameters?

Somewhat: https://jump.dev/JuMP.jl/stable/manual/solutions/#Checking-feasibility-of-solutions. But it isn't optimized to evaluate this many times quickly.

SymbolicRegression is a local-search based algorithm

I include GAs as a local-search. You can't generate a proof of global optimality.

Let's step back a bit: what is the dimension of input and output? How big are the expressions?

It seems like you have small problems with simple constraints, and a very complicated objective function. Overly generalizing: JuMP is really structured for problems with complicated constraints and simpler objective functions.

MilesCranmer commented 1 month ago

No, scalar output only.

I see. Is this something that does not exist because there is not a need for it, or is it fundamentally incompatible with the internals? If the former, how much effort would it be to add this type of feature?

I include GAs as a local-search. You can't generate a proof of global optimality.

Got it, indeed that is true.

Let's step back a bit: what is the dimension of input and output? How big are the expressions?

This varies by multiple orders of magnitude across the community's use-cases. Some users might have inputs with size (100,000 rows, 100 features), and want to search for expressions with hundreds of operators, while some others might have inputs with size (10, 1), and the ground truth expression would be something simple like y = 3 * cos(2 * x). Sometimes it might even be (1, 1) – i.e., a single scalar number – if the user is trying to discover a simple approximation to some number using only integers, operators, and other mathematical constants.

But across all of these cases, sometimes users will want to impose constraints that seem appropriate for an MINLP solver to optimize.

I don't want to generalize the community's use-cases (there seems to be broad interest in integral and other discrete constraints) but for the problems I work on, it's usually size (1000 rows, 10 features), and I am looking for expressions that have maybe a complexity of 30 (30 total operators or constants or instances of variables in the expression).

odow commented 1 month ago

or is it fundamentally incompatible with the internals? If the former, how much effort would it be to add this type of feature?

It requires a fairly major redesign of some of the most complicated code in JuMP... but it's on our roadmap: https://jump.dev/JuMP.jl/stable/developers/roadmap/ https://github.com/jump-dev/MathOptInterface.jl/issues/2402

For input and output, I mean length(references) and length(dataset).

JuMP is built to solve problems that have O(10^3)-O(10^5) inputs (which will become decision variables) and outputs (which will become constraints here), and solving just this one problem may take seconds to minutes to hours.

Coming back to the "should I use JuMP" page, I think the answer for your use-case might be "no".

Couldn't you just get the user to provide you a function that returns true/false for violated constraints? They're unlikely to write very complicated constraints right?

MilesCranmer commented 1 month ago

So, length(references) is typically between 1 and 20, and length(dataset) is between 1 and 100,000. It really depends on the user and what they are trying to achieve.

Couldn't you just get the user to provide you a function that returns true/false for violated constraints? They're unlikely to write very complicated constraints right?

As I described above, this is the current approach that I recommend users to try if they have integral constraints. However, (1) it's a bit hacky, and (2) it doesn't work that well because BFGS is not an MINLP solver. This is why I am interested in some kind of JuMP integration.

MilesCranmer commented 1 month ago

Ah, I see https://jump.dev/JuMP.jl/stable/tutorials/nonlinear/tips_and_tricks/#User-defined-operators-with-vector-outputs. The memoization approach sounds like it could work. Are there any solvers that have multiple parameter searches running in parallel? Otherwise I could just pre-allocate a single buffer for the vector output, and update it whenever the hash of the parameters change. Seems pretty straightforward on my end.

odow commented 1 month ago

Have you tried solving some of these as MINLPs? How long do they take? I would assume that even the small problems take seconds.

Are there any solvers that have multiple parameter searches running in parallel?

I don't think so, mainly because for a long time Julia was not thread-safe to call back into from C, so we typically disable multithreading if there are callback oracles in the solver wrappers.

odow commented 1 month ago

I guess we're overlooking the more obvious:

function optimize_expression_in_step_4(expression, dataset, options, optimizer)
    _, references = get_scalar_constants(expression)
    function loss_fn(p...)
        set_scalar_constants!(expression, collect(p), references)
        y_pred, completed = eval_tree_array(expr, dataset.X, options)
        if !completed
            # This is likely problematic. They should be encoded as constraints
            # instead.
            return Inf
        end
        return sum(abs2(a - b) for (a, b) in zip(dataset.y, y_pred))
    end
    function loss_fn_gradient(g::AbstractVector, p...)
        g .= loss_fn'(p...)  # Or something?
        return nothing
    end
    model = Model(optimizer)
    @variable(model, p[1:length(references)], Int)
    @operator(model, op_loss_f, length(references), loss_fn, loss_fn_gradient)
    @objective(model, Min, op_loss_fn(p...))
    optimize!(model)
    set_scalar_constants!(expression, value.(p), references)
    return objective_value(model)
end

odow commented 1 month ago

Is there anything actionable to do here on the JuMP side of things?

MilesCranmer commented 1 month ago

Sorry I haven't had a chance to look at this again the past few days due to other things, will try to get to it on the weekend.

odow commented 4 weeks ago

I think regardless, I don't know if there is anything that we need to do here with respect to changes in JuMP or MathOptInterface (other than https://github.com/jump-dev/MathOptInterface.jl/issues/2402).

odow commented 3 weeks ago

I'm going to close this issue as "won't-fix" because there is nothing actionable that we plan to change in JuMP or MOI.

If you have more questions, please post on the forum, https://discourse.julialang.org/c/domain/opt/13, or the #jump-dev channel in slack.