modelica / fmi-standard

Specification of the Functional Mock-up Interface (FMI)
https://fmi-standard.org/
Other
274 stars 85 forks source link

Provide adjoint derivatives #664

Closed chrbertsch closed 4 years ago

chrbertsch commented 5 years ago

Currently FMI 2.0 provides and interface to provide partial derivatives in the form of dirctional derivatives (e.g., Jacobian J times direction vector v, Jv)

For several use cases, it would be beneficial to get vector-Jacobian products vTJ, or adjoint derivatives from the FMU: -using FMUs in the context of AI frameworks (often called there“VJP” vector-gradient-product). There adjoint derivatives are used in the backpropagation process to do gradient-based optimization of parameters ussind Automatic Differentiation (AD). Also neural differential equations (https://github.com/JuliaDiffEq/DiffEqFlux.jl) or hybrid forms of AI and equation/physics-based models could be supported together with FMUs. This feature could widen the scope of FMI.

fmiStatus  fmi2GetAdjointDerivative (
fmi2Component c ,
cons fmi2ValueReference  vUnknown_ref [ ] ,
size_t  nUnknown ,
cons fmi2ValueReference  vKnown_ref [ ] ,
size_t  nKnown ,
const fmi2Real  dvUnknown [ ] ,
fmi2Real  dvKnown [ ])

I consider this proposal mature and intend to create a PR.

@HansOlsson, @ChrisRackauckas, @jph-tavella, @masoud-najafi, @CSchulzeTLK, @rfranke, and others: your comments would be very much appreciated.

HansOlsson commented 5 years ago

In terms of implementation effort this is straightforward, but time-consuming (much more than other variants). The problems I see are four-fold:

The memory consumption is due to reverse-mode AD (at least traditionally) storing all operations on reals and then running them in reverse order. One can make a trade-off and use less memory for this - and instead re-run operations; but that is slower. That is something we might need to consider as part of the design.

ChrisRackauckas commented 5 years ago

The last point is that derivatives assume continuity around the point, and in many optimization scenarios the optimum is at the limit of triggering events. Having efficient adjoint derivatives does not help with that.

The adjoint implementation handles that. The adjoint implementation still needs the vjp to do that.

In terms of implementation effort this is straightforward, but time-consuming (much more than other variants).

I would not commit to a single implementation like ADOL-C (which is slow...), and instead allow for this to be a function which could exist that provides a vjp. There are many ways to implement the pullback, and fully non-allocating pullbacks derived in a symbolic form are possible, but not in all cases, so I wouldn't commit to one way of doing it

t-sommer commented 5 years ago

I wouldn't commit to one way of doing it

Does the proposed API impose any constraints on the implementation?

ChrisRackauckas commented 5 years ago

I was just saying I hope that's not done, and instead the API just has this function which anyone can write. I am not sure what the proposal is implying on that, but the paper said they extended FMI to build adjoints with ADOL-C.

chrbertsch commented 5 years ago

replying to @HansOlsson :

In terms of implementation effort this is straightforward, but time-consuming (much more than other variants). The problems I see are four-fold:

* It is to easy to "cheat" and compute it inefficiently from multiple directional derivatives; so we have to make sure that it is actually provided efficiently by the FMU.

This is an issue of the tool vendor implementing the API function as with any other implementation detail of FMU export. It is up to the user to judge good or bad implementations. Perhaps for Modelica models als source of FMUs we could set up a benchmark regarding partial derivative implementations.

Regarding efficiency: with a good implementation of the adjoint derivatives, in cases when one needs only the vector-Jacobian-products motivated above, one can expect significant speedups as described in the paper above.

* It is rather memory consuming (especially for co-simulation and model-code with loops).

The memory consumption is due to reverse-mode AD (at least traditionally) storing all operations on reals and then running them in reverse order. One can make a trade-off and use less memory for this - and instead re-run operations; but that is slower. That is something we might need to consider as part of the design.

This is also up to the implementer, not to the interface

* This adds the next point - in particular for co-simulation FMUs. In many scenarios we want the adjoint for the entire simulation interval (e.g. optimal control); can the FMU store that?

Do you mean the values at the end of different macro steps ? (Could be stored outside the FMU) Or the intermediate values within one macro step of a co-simulation FMU? Then this would be a feature of "Intermediate variable access" of FMI 3.0

* The last point is that derivatives assume continuity around the point, and in many optimization scenarios the optimum is at the limit of triggering events. Having efficient adjoint derivatives does not help with that.

But this does not deteriorate the situation compared to directional derivatives, right? As mentioned above, the adjoint derivatives are heavily used also in contexts beyond phyiscs-based models (e.g. neural networks)

replying to @ChrisRackauckas :

I was just saying I hope that's not done, and instead the API just has this function which anyone can write.

This is the basic idea of the FMI standard: only the interface is defined, the implementation of the interface functions is up to the exporing tools.

I am not sure what the proposal is implying on that, but the paper said they extended FMI to build adjoints with ADOL-C.

One should see this only as an example

replying to @t-sommer :

Does the proposed API impose any constraints on the implementation?

No

jph-tavella commented 5 years ago

I perfectly agree with @chrbertsch. The proposed API does not impose any constraints for implementation by tool vendors. Performance, memory consumption, accuracy in calculations, etc. are an issue for the tool vendors and at the user point of view, it's his/her responsability to simply judge good or bad implementations, and then to privilege FMUs from one tool or from another one.

chrbertsch commented 5 years ago

Regular FMI Design Meeting:

One should have at least an as good description as for directional derivatives. Should not depend on a specific implemnetation We should have a concrete proposal in a PR. Christian: I will work on this. We should ask the AI / ML community if this is the only change they would need.

chrbertsch commented 5 years ago

I have started working on this issue on a branch :https://github.com/chrbertsch/fmi-standard/tree/adjoint-derivatives Feel free to comment and contribute.

masoud-najafi commented 5 years ago

Please correct me if I am wrong. This API allows retrieving the Jacobian matrix with a single call to fmi2GetAdjointDerivative by setting nUnknown=NX and nKnown =NX (where NX is the number of states). Also, dvUnknown is almost useless, because the importer can do the multiplication outside of the function. The API fmi2GetDirectionalDerivative and the proposed fmi2GetAdjointDerivative almost do the same thing. In other words, each API can be obtained from the other API by an appropriate wrapper. If we want fmi2GetAdjointDerivative, fmi2GetDirectionalDerivative is no longer needed.

chrbertsch commented 5 years ago

@masoud-najafi : You cannot get the full Jacobian Matrix with one call of fmi2GetDirectionalDerivative or fmi2GetAdjointDerivative.

fmi2GetDirectionalDerivative returns a column vector of size nUnknowns, that equals Jacobian matrix times the seed vector (plesae note the seed vector \Delta v_known has the same size as the vector v_known (I think, this is not yet stated explicitly, but otherwise, the formula grafik does not make sense)

fmi2GetAdjointDerivative returns a row vector of size nKnowns, that equals the vector v (transposed) times the Jacobian (pleas note the vector \Delta v_unknown has the same size as the vector v_unknown.

The importer cannot do the multiplication outside the function, as the return value of the function is already the result of the multiplication.

Thus one needs multiple calls of fmi2GetAdjointDerivative or fmi2GetDirectionalDerivative to construct the Jacobian matrix.

If the FMU can calculate either directional or adjoint derivatives efficiently (e.g., with means of AD), in the case of sparse Jacobians it is not efficient to calculate directional derivatives from adjoint derivatives and vice versa, but the efficient implementation of depends on whether the FMU supports forward or backward AD or both.

masoud-najafi commented 5 years ago

Then I do not understand why and how in the above paper, it is indicated that with fmi2GetAdjointDerivative one "row" of the Jacobian matrix can be retrieved with only a single call. Can anyone clarify this by specifying the arguments of fmi2GetAdjointDerivative?

ChrisRackauckas commented 5 years ago

Just use the e_i basis function for the i'th row. It just follows from being the vjp. For more information I'd just link to my lecture notes which build up differentiable programming from a vjp/jvp standpoint:

https://mitmath.github.io/18337/lecture9/autodiff_dimensions https://mitmath.github.io/18337/lecture10/estimation_identification https://mitmath.github.io/18337/lecture11/adjoints

t-sommer commented 4 years ago

I've started with the implementation of fmi3GetAdjointDerivatives() for the Reference FMUs on https://github.com/t-sommer/Reference-FMUs/tree/adjoint-derivatives.

chrbertsch commented 4 years ago

merged in to master with #722