[FEATURE] Options for vector-valued functions

baggepinnen commented 3 weeks ago

Hello there :wave: I'm writing regarding the following admonition in the docs :)

I am solving optimal-control problems where the dynamics are encoded in the form of a ModelingToolkit model. ModelingToolkit can generate very efficient code for the right-hand side of the dynamics, i.e., I can obtain an executable function $f$ in

\dot x = f(x, u)

where $x$ and $u$ are potentially high-dimensional (length(x) == 10) in the particular example I'm working on right now).

Tracing through $f$ with JuMP variables (infinite variables) "works", with plenty of hacks, but the result is very slow (the traced expression for $\dot x$ is too large to even print to the REPL without crashing julia, while evaluating the code generated by MTK takes 1µs only). The issue is that the code emitted by MTK contains tons of common sub expressions and inlined solutions of linear systems etc ($f$ is the dynamics of a high-dimensional multibody system), this is all handled very efficiently in $f$, where memory for solving the linear systems is manually allocated and freed etc. making $f$ GC allocation free.

To avoid explosive expression growth when tracing through $f$ with JuMP variables, I replace the linear-system solves x = A\b that appear in $f$ with the following

        x = InfiniteOpt.@variables(m, begin
            [1:n], Infinite(τ) # Temporary anonymous variables
        end)[1]
        lhs = A*x
        @constraint(m, lhs .== b)
        bout .= x

i.e., I introduce temporary variables x and equality constraints A*x == b. As I said, this works, but memory requirements are sky high and I believe it should be possible to improve the performance by at least 1000x over the performance this gives me.

Coming back to the admonition, I have been thinking about different ways to work around the limitations on vector-valued nonlinear functions but haven't found any alternative that is working out for me. What approach did you have in mind with

However, this limitation can readily be removed if there is a use case for it

?

I am willing to jump through quite a few hoops to make this efficient if required :) Thanks for your time!

pulsipher commented 3 weeks ago

Hi there,

The quoted admonition is removed in the development version of the package that will be released relatively soon. The current version uses its own nonlinear expression system, but the new version adopts the nonlinear modelling interface provided by JuMP based on GenericNonlinearExpr. JuMP does not currently directly support vector valued user-defined functions, but there are workarounds (see https://jump.dev/JuMP.jl/stable/tutorials/nonlinear/tips_and_tricks/#User-defined-operators-with-vector-outputs).

In the longer term, adding array-valued nonlinear expression support is on the JuMP development roadmap (see https://github.com/jump-dev/MathOptInterface.jl/issues/2402 and https://jump.dev/JuMP.jl/stable/developers/roadmap/#Development-roadmap). Once, this is supported by JuMP, InfiniteOpt will automatically inherit this ability as well.

For your model, I strongly suspect it will be much more performant to directly define the ODE model in InfiniteOpt (especially with the experimental InfiniteExaModels backend) and not use MTK. However, you can also try the aforementioned workaround to instead embed the MTK function as a user-defined function; in which case I suggest trying it with the development version of InfiniteOpt (i.e., Pkg.add(url = "https://github.com/infiniteopt/InfiniteOpt.jl", rev = "master"))

baggepinnen commented 3 weeks ago

but there are workarounds (see https://jump.dev/JuMP.jl/stable/tutorials/nonlinear/tips_and_tricks/#User-defined-operators-with-vector-outputs).

I have tried this workaround without much success. Also, when I tried, the InfiniteOpt.@register macro crashed julia with the number of function arguments required for this system.

Ditching MTK is not an option for me in this case for two reasons

I'm working on coming up with general strategies to solve OC problems with MTK models for our commercial customers that are using MTK. For non-multibody MTK models the approach of tracing through the generated dynamics with InfiniteOpt/JuMP variables have worked fine, but for multibody dynamics the "everything is scalar" approach of JuMP does not scale well enough.
I'm developing and using a multibody component library for MTK to model this particular system. Deriving the dynamics of a 3D multibody system with 10-100s of components (including other domains such as electrical and thermal) and kinematic loops manually is extraordinarily tedious and error prone without such a library.

I have spent the day implementing a primitive trajectory-optimization package that performs the direct collocation transcription and have obtained reasonable performance, except for computing the Hessian of the Lagrangian, which is exactly where JuMP is struggling as well. Sparse forward-over reverse mode AD does best, but not quite well enough. I have tried MadDiff.jl and FastDifferentiation.jl before, and they are blazingly fast for the operations they support, but I've found the support to be too limited to be workable for generic MTK models.

I will give the InfiniteExaModels package a quick spin though to see if it improves upon the JuMP AD in this case :)

pulsipher commented 3 weeks ago

Thanks for the clarification on your use case. This certainly provides a compelling argument for the proposed array-valued AD system in JuMP.

Coming up with an efficient approach to compute the Hessian for these types of problems is certainly a challenge and one of the reasons that the JuMP milestone is taking some time. Out of curiosity, did the memoize approach fail using the master branch? If so, can you provide some more details on the problem that caused the failure? This would be helpful info to consider going forward with the planned JuMP developments.

Depending on the types of operations and the length of the traced expressions, InfiniteExaModels may or may not work well. It is really intended for models the define the ODEs/PDEs directly in InfiniteOpt. Perhaps writing a bridge from MTK directly to an InfiniteModel would be better than generating the evaluation function, but of course this would require some development. I have considered such a bridge in the past, but I don't know MTK that well and I have limited development time these days.

baggepinnen commented 2 weeks ago

Out of curiosity, did the memoize approach fail using the master branch?

No, it was with latest release. I think that the problem was related to the vast number of methods that were registered by the @register macro when the function had many input arguments. In particular: type_combos = vec(collect(Iterators.product(ntuple(_->Ts, num_args)...))) will be extremely large if you try to register a function that takes long arrays. Since arrays were not supported, you had to splat the arrays and use functions with one scalar input argument per array element. I don't think the new JuMP nonlinear interface currently has anything that can combat this issue?

pulsipher commented 2 weeks ago

Out of curiosity, did the memoize approach fail using the master branch?

No, it was with latest release. I think that the problem was related to the vast number of methods that were registered by the @register macro when the function had many input arguments. In particular: type_combos = vec(collect(Iterators.product(ntuple(_->Ts, num_args)...))) will be extremely large if you try to register a function that takes long arrays. Since arrays were not supported, you had to splat the arrays and use functions with one scalar input argument per array element. I don't think the new JuMP nonlinear interface currently has anything that can combat this issue?

Actually it does. Now the @operator macro makes an operator object that doesn't require or create a bunch of methods like @register. For this use case, it should be much faster. If you have the time, I suggest giving it a try to see if it works.

infiniteopt / InfiniteOpt.jl

[FEATURE] Options for vector-valued functions #363