modelica / ModelicaStandardLibrary

Free (standard conforming) library to model mechanical (1D/3D), electrical (analog, digital, machines), magnetic, thermal, fluid, control systems and hierarchical state machines. Also numerical functions and functions for strings, files and streams are included.
https://doc.modelica.org
BSD 3-Clause "New" or "Revised" License
452 stars 165 forks source link

Catastrophic cancellation in PadeDelay1 #4409

Open henrikt-ma opened 1 month ago

henrikt-ma commented 1 month ago

This concerns a potential issue with the example ModelicaTest.Blocks.PadeDelay1.

In System Modeler, der(padeDelay1.x[1] is computed using the following expression:

der(padeDelay1.x[1]) = padeDelay1.a1[10] * padeDelay1.x[10] + padeDelay1.a1[9] * padeDelay1.x[9] + padeDelay1.a1[8] * padeDelay1.x[8] + padeDelay1.a1[7] * padeDelay1.x[7] + padeDelay1.a1[6] * padeDelay1.x[6] + padeDelay1.a1[5] * padeDelay1.x[5] + padeDelay1.a1[4] * padeDelay1.x[4] + padeDelay1.a1[3] * padeDelay1.x[3] + padeDelay1.a1[1] * padeDelay1.x[1] + padeDelay1.a1[2] * padeDelay1.x[2] + padeDelay1.b11 * padeDelay1.u

The first time this is evaluated, all terms are zero except the first and the last one. The first one is -6.5472907499999936e+17, while the last one is 6.5472907499999936e+17, which leads to completely catastrophic cancellation. We happen to get away with it on all platforms except one, but the catastrophic cancellation as such exists on all platforms.

Is it System Modeler which is not smart enough about handling the PadeDelay equations, or is the catastrophic cancellation happening in other tools as well?

On a side note, if I enable balance in the PadeDelay, the simulation runs fine, but I haven't investigated whether this is thanks to actually eliminating the catastrophic cancellation.

HansOlsson commented 1 month ago

Well, the question is really whether cancellation happens in the model - and the answer is that it does:

We know thatder(x[1]) starts at 0, and the formula for computing it involves a1[10]*x[10].

Without balancing x[10] starts at 0.1, and a1[10] is -6.5e18, so obviously there must be cancellation. With balancing x[10] start at 25.6 and a1[10] is -5.67 (all a1-coefficients are between -550 and -3.26 and abs(x[i])<1600), so the cancellation is avoided.

What isn't clear is whether this cancellation is really problematic or not, as without balancing der(x[1]) goes smoothly up to 1e16, and still the output looks the same as with balancing.

As an example modifying the unbalanced model to set x[1].start=1e14 barely modifies the Padé output indicating that the model is robust with regards to the errors in that term (when using 64-bit floating point numbers), and thus having a non-zero derivative for it doesn't seem that problematic.

To me the obvious improvement would be to make balance the default, but that isn't backwards compatible.

Note that if we were to increase n=18 then the unbalanced model seems to hang with Dymola; whereas the balanced model even runs (slowly) with n=30.

henrikt-ma commented 4 weeks ago

What isn't clear is whether this cancellation is really problematic or not, as without balancing der(x[1]) goes smoothly up to 1e16, and still the output looks the same as with balancing.

On the affected platform, we get garbage in the order of 1e2 in the derivative, which appears to be enough to make the error control effectively hang the simulation. On the other platforms, we happen to get an exact 0.0 instead of the noise, and integration completes in no time.

To me the obvious improvement would be to make balance the default, but that isn't backwards compatible.

I suppose the backward incompatibility would be easy to accept if the change is considered a bugfix.