modelica / fmi-standard

Specification of the Functional Mock-up Interface (FMI)
https://fmi-standard.org/
Other
260 stars 84 forks source link

Getting multiple forward or adjoint directional derivatives with one call. #1902

Open chrbertsch opened 8 months ago

chrbertsch commented 8 months ago

(From a discussion with @jaeandersson on efficiency improvements for using FMI in optimization problems)

As a very simple example, consider the scalar function f(x) = sin(x). The forward or adjoint directional derivative is df1(x, v) = cos(x)v. If you can calculate two directional derivatives, you have df2(x, v1, v2) = (cos(x)v1, cos(x)v2). df2 is a lot* cheaper than two calls to df1, since cos(x) only need to be calculated once and trigonometric functions an order of magnitude more expensive than addition/multiplication. The same will hold for very large ducting from Modelica tools. Keep in mind that the intermediate operations and their linearizations (cos(x) above) usually have to be recalculated between calls, otherwise memory use can blow up.

In terms of API, you could add the support in a backwards compatible way by permitting nSensitivity to be multiples of the total number of knowns in fmi3GetDirectionalDerivative or multiples of the total number of unknowns in fmi3GetAdjointDerivative.

t-sommer commented 8 months ago

Would it be possible to calculate and cache the values upon the first invocation of the respective API functions?

jaeandersson commented 8 months ago

Would it be possible to calculate and cache the values upon the first invocation of the respective API functions?

Yes, it's possible, but often not desired or practical. It would require the storing of every intermediate operations used for the calculation, which may require too much memory.

jaeandersson commented 8 months ago

Would it be possible to calculate and cache the values upon the first invocation of the respective API functions?

Yes, it's possible, but often not desired or practical. It would require the storing of every intermediate operations used for the calculation, which may require too much memory.

To add to this: Think of the case where the BLT sorting in a Modelica tool results in a series of Newton iterations. In the forward/adjoint propagation, you would need to do linear solves using the same linearized system (transposed for adjoints). Multiple forward/adjoint directions just corresponds to multiple right-hand-sides of the linear systems. Working with multiple right-hand-sides is usually more efficient. Also, if you have to redo the Jacobian calculation and/or factorization, this cost will be the same regardless of the number of directions.

HansOlsson commented 8 months ago

To explain the proposal I think one should view it as an interface that computes J*V for a matrix V instead of a vector v (and similarly for adjoint), I can see that it might offer some minor advantage (especially for adjoints), but it is not clear to me that it is needed and I could see some alternatives.

To add to this: Think of the case where the BLT sorting in a Modelica tool results in a series of Newton iterations. In the forward/adjoint propagation, you would need to do linear solves using the same linearized system (transposed for adjoints). Multiple forward/adjoint directions just corresponds to multiple right-hand-sides of the linear systems. Working with multiple right-hand-sides is usually more efficient. Also, if you have to redo the Jacobian calculation and/or factorization, this cost will be the same regardless of the number of directions.

However, the Jacobian factorization is normally the same ones as used for computing the values - so either one could cache it from then or one could consider an interface computing both (at least for forward). If the Jacobian is recomputed and refactorized during the solution it's messier - either all factored Jacobians need to be stored (typically less than a handful) or one has to "cheat".

Sending in multiple directions require that all of the directions are known at the same time; whereas caching would handle the different directions being computed in sequence.

If we are concerned that caching takes memory and time when not used, one could consider doing something special before the value call.

chrbertsch commented 8 months ago

FMI-Design meeting: Pierre: How many tools support this? Torsten: We should start creating a prototype before including it in the standard Pierre: How many tools are supporting adjoint derivatives? Christian: New tools come up, for optimization and sensitivity calculation Karl: This is a tool optimization, one could do it different. Klaus: If we would see a prototype for an FMU, we can discuss this. Klaus: this proposal can be also applied to directional derivatives.

jaeandersson commented 8 months ago

To take a step back: Fundamentally, what is important is that we can calculate multiple directional derivatives efficiently. If that can be resolved with caching inside the FMUs, doing it in one call is not necessary. I think that there are fundamental reasons to why you can be faster with a single call as opposed to multiple calls (such as being able to use matrix-operations internally and less caching/checkpointing), but caching multiple calls might be fast enough.

chrbertsch commented 1 month ago

Comment: this could also been in a layered standard, that defines new additional functions. If it shall be included in a minor version a working group should work on a proposal