Questionable parametrization of V_pulse in Spice3 library leads to numerically unreliable computations

modelica / ModelicaStandardLibrary

Free (standard conforming) library to model mechanical (1D/3D), electrical (analog, digital, machines), magnetic, thermal, fluid, control systems and hierarchical state machines. Also numerical functions and functions for strings, files and streams are included.

https://doc.modelica.org

BSD 3-Clause "New" or "Revised" License

475 stars 169 forks source link

Questionable parametrization of V_pulse in Spice3 library leads to numerically unreliable computations #4478

Open casella opened 1 week ago

casella commented 1 week ago

While analyzing possible regressions from MSL 4.0.0 to 4.1.0 in OpenModelica, we found issues with several Spice3 model:

Modelica.Electrical.Spice3.Examples.Inverter
Modelica.Electrical.Spice3.Examples.FourInverters
Modelica.Electrical.Spice3.Examples.InvertersApartRecord
Modelica.Electrical.Spice3.Examples.InvertersExtendedModel
Modelica.Electrical.Spice3.Examples.Nor
Modelica.Electrical.Spice3.Examples.Spice3BenchmarkFourBitBinaryAdder

all failing at the beginning of the simulation with this runtime error:

division by zero at time 1.000000000300011, (a=inf) / (b=0), where divisor b expression is: v.Tfalling - v.Twidth

I checked the Inverter model more in depth. It contains an instance v of the Modelica.Electrical.Spice3.Sources.V_pulse model:

https://github.com/modelica/ModelicaStandardLibrary/blob/8e7876b5414475abab003c466ca49050812aeb01/Modelica/Electrical/Spice3.mo#L3129-L3159

The component v is instantiated as: https://github.com/modelica/ModelicaStandardLibrary/blob/8e7876b5414475abab003c466ca49050812aeb01/Modelica/Electrical/Spice3.mo#L300 so both v.PW and v.PER retain their default value of Modelica.Constants.inf. The resulting chain of parameter assignments is thus:

v.Trising := v.TR = 1e-13
v.Twidth := v.Trising + v.PW = 1e-13 + 1.7e308
v.Tfalling := v.Twidth + v.TF + v.TR = 1e-13 + 1.7e308 + 1e-13

it is then obvious that the computation of the denominator v.Tfalling - v.Twidth = 1e-13 + 1.7e308 + 1e-13 - (1e-13 + 1.7e308) in the expression for v.v is numerically ill-defined. In fact, it looks quite odd to me that this issue hasn't been raised before.

I also checked the documentation of the component, which is not very clear as it states "All parameters of sources should be set explicitly"; if "set explicitly" meant "have explicit modifiers", then there should be no defaults.

In any case, if a MSL component has default values, it should work properly when those values are not changed, and this is obviously not the case with this component, so we should do something about it.

HansOlsson commented 1 week ago

The (T0 + Tfalling - time)*(V2-V1)/(Tfalling - Twidth) is found inside else-part of if (time < T0 + Twidth) then V2-V1 else (T0 + Tfalling - time)*(V2-V1)/(Tfalling - Twidth).

Thus it should only be activated when time>=T0+Twidth, and as previously discussed Twidth is infinite, i.e., the else-branch is not active.

Trying to evaluate an else-branch in such cases is a clear tool-error.

casella commented 1 week ago

@HansOlsson I'm not sure about that.

It is well-known that writing something like

v = if x>=0 then sqrt(x) else 0;

does not prevent a tool from computing sqrt(x) with negative x. Why should this case be different?

casella commented 1 week ago

I checked the structure of those failing models. They all have one feature in common: on the left side, there is a V_pulse component which is obviously meant to generate some kind of input signal to the circuit. On the right side, there is another V_pulse component that apparently provides the voltage supply to the circuit. I'm not 100% sure why the right-side component was not implemented with a V_constant source, but I guess the intent was to start the simulation with the source switched off and then ramp it up gradually, to avoid initialization problem. It would be good to have some feedback from @kristinmajetta about this, as she actually wrote those models in 2009.

If that is the intent, basically using V_pulse as a truncated ramp source, then we should modify the defaults so that this ramp is computed reliably. What we need to avoid is the division by zero when computing those differences, so we need the default for PER to be larger than the default for PW, e.g.

  parameter SI.Time PW = 1e10 "Pulse width";
  parameter SI.Time PER= 1e11 "Period";

I guess those two values (approximately 300 and 3000 years) are large enough to represent an infinite time horizon for an electronic circuit for all practical purposes, but they will avoid the division-by-zero issue.

EDIT: unfortunately this won't solve the Tfalling-Twidth = 0 issue, since 1e-13 + 1e10 + 1e-13 - (1e-13 + 1e10) is still not very numerically reliable.

casella commented 1 week ago

Comments from other tool developers are welcome 😃

HansOlsson commented 1 week ago

@HansOlsson I'm not sure about that.

It is well-known that writing something like
v = if x>=0 then sqrt(x) else 0;
does not prevent a tool from computing sqrt(x) with negative x. Why should this case be different?

Because that is due to event-epsilon, and as stated in https://specification.modelica.org/maint/3.6/operators-and-expressions.html#evaluation-order "relational operators generating state or time events will during continuous integration have the value from the most recent event".

In x>=0 the variable x may during continuous integration drop below 0, whereas time < T0 + Twidth with right-hand-side infinity would require that time magically increased to infinity (and beyond) during integration to a finite point.

Added: Oh, and even if time magically did increase to infinity it wouldn't change anything, as the boolean expression keeps the value from the most recent event; so it would require time to be at infinity at an event (when integrating between finite points in time), and then decrease to be inside the finite interval during the continuous integration.

casella commented 1 week ago

I checked more in depth what happens in OpenModelica. If we define ModelicaServices.Machine.inf with any value up to half than the largest representable IEEE 754 double precision number (0.89885e308) this works as expected, see @HansOlsson's comment. With larger values of ModelicaServices.Machine.inf, I get the division by zero. Apparently, there is some subtle overflow issue going on in the run-time event detection mechanism.

Problem solved.

EDIT: the problem is not solved, see below.

casella commented 1 week ago

In fact, it turns out the issue with OpenModelica was triggered by #4042, which changed

https://github.com/modelica/ModelicaStandardLibrary/blob/d2dcb39e055ceb88fef5328a26255cdd094389bd/ModelicaServices/package.mo#L185-L186

into https://github.com/modelica/ModelicaStandardLibrary/blob/8e7876b5414475abab003c466ca49050812aeb01/ModelicaServices/package.mo#L184-L185

so in some sense it is indeed a regression caused by changes of the MSL. I'm not 100% sure how this should be handled, but I guess since ModelicaService can be patched by each tool vendor, this is really a tool issue, so the easiest way for us is just to patch it locally.

However, I'm still wondering, we are actually using Modelica.Constants.inf in models to represent very large values (e.g. as defaults for event times that are never triggered) that can still be represented. Which means, people do write (as in the V_pulse model) Modelica code such as

v = if time < Modelica.Constants.inf then 0 else 1;

Having the maximum representable value there seems to me a bit borderline (in the precise meaning of the word). Shouldn't we allow for some safety margin? Mabye 1e60 was really too small, but what about 1e300?

@HansOlsson, @maltelenz, @eshmoylova, @hubertus65 what do you think?

HansOlsson commented 1 week ago

v = if time < Modelica.Constants.inf then 0 else 1;

Having the maximum representable value there seems to me a bit borderline (in the precise meaning of the word). Shouldn't we allow for some safety margin? Mabye 1e60 was really too small, but what about 1e300?

@HansOlsson, @maltelenz, @eshmoylova, @hubertus65 what do you think?

There are three parts to this:

Should Modelica.Constants.inf have a value corresponding to the floating point numbers (technically DBL_MAX - not infinity)? I think the answer is clearly yes.
Should the line above work? I would say so.
Should we use Modelica.Constants.inf in this way in models or some other number? That isn't clear, but other models have shown that 1e300 wouldn't be safe either - it depends on the operations:
- if add a normal number it should work I think
- if you multiply it by another number it depends on the size of that number
- if you cube it (as was done in some models) you need something around 1e100 or less.

For cubing and multiplying I believe it would be better to rewrite the model.

casella commented 1 week ago

I moved the discussion to #4479 to avoid spillover.

casella commented 6 days ago

Given how the discussion in #4479 goes, maybe I should re-open this ticket.

We do have a regression in OpenModelica, due to #4042, which changed from MSL 4.0.0 to 4.1.0. The reason is that PW and PER went from 1e60 to 1.7e308.

Since there is no reason whatsoever to have such huge default values in this component, which is meant for circuit simulations that typically last few seconds, I think the best solution is to change them to more reasonable values like 1e10 (300 years). That achieves the same purpose (obtaining a ramp instead of a periodic pulse) and avoids the regression by steering clear of finite precision representation limits.

casella commented 6 days ago

I'd like to hear opinions outside the set of Dymola developers, if possible 😃

casella commented 6 days ago

I thought a bit more about this issue, and I think I can articulate a more compelling argument in favour of changing the defaults of this model than "it doesn't work in OpenModelica", which per se is not a valid argument to change a model in the MSL.

These are the parameter definitions of the V_pulse model

https://github.com/modelica/ModelicaStandardLibrary/blob/8e7876b5414475abab003c466ca49050812aeb01/Modelica/Electrical/Spice3.mo#L3131-L3145

and this is the equation that determines the output voltage

https://github.com/modelica/ModelicaStandardLibrary/blob/8e7876b5414475abab003c466ca49050812aeb01/Modelica/Electrical/Spice3.mo#L3156-L3159

The convention in the MSL is that if a parameter has a default value, this should correspond to some reasonable default behaviour. In this case, the periodic pulse degenerates into a truncated ramp with only a rising front.

The problem is that after #4042, the definition of Modelica.Constants.inf became: https://github.com/modelica/ModelicaStandardLibrary/blob/8e7876b5414475abab003c466ca49050812aeb01/Modelica/Constants.mo#L22-L23

Now, consider the binding equation for Twidth, which is used in the equation that determines the output voltage v:

Twidth = Trising + PW

if the value of PW is the maximum representable finite floating-point number and Trising is a positive number, then Twidth is obviously undefined, because you are adding something to a number which is by definition the largest possible one that can be represented in Modelica. In fact, trying to compute such a value should probably throw an overflow error, in the same way as computing 1/0 throws a division-by-zero error in all Modelica tools. Let me remind that IEEE 754 defines Inf and NaN and the associated semantics, but Modelica 3.6, which we must use for MSL 4.1.0, doesn't.

Hence, the default behaviour of this model is highly questionable and totally implementation-dependent; as such, it shouldn't belong to the Modelica Standard library. The fact that some Modelica tools can produce meaningful results out of such undefined expressions is irrelevant and doesn't make it a valid standard Modelica model.

Summing up, if Modelica.Constants.inf is not just a very large number but rather the largest one that can be represented in Modelica, it should not be used as a default for PW and PER; a smaller value should be used that ensures all expressions determining events in the equation for v, i.e., T0 + Tfalling, T0 + Trising, T0 + Twidth are valid expressions with well-defined values.

In #4479 I proposed to define a one-fits-all value for this kind of problem in Modelica.Constants, as I believe that the previously used value of 1e60 (possibly with a name different from inf) could do a good job in most cases, but I can agree that fulfilling that requirement in a completely general way may be problematic and bear unintended side effects.

Hence, the only solution I see to solve this issue is to define a domain-specific large value that makes sense in the context of the Spice3 package and doesn't lead to undefined expressions, e.g.:

constant SI.Time infiniteTime= 1e20 "Numerically safe very large time horizon (3e12 years)"

and use it to set default values for PW and PER in both V_pulse and I_pulse. PR #4481 implements this proposal.

I hope we can get some consensus on this 😅

HansOlsson commented 6 days ago

I thought a bit more about this issue, and I think I can articulate a more compelling argument in favour of changing the defaults of this model than "it doesn't work in OpenModelica", which per se is not a valid argument to change a model in the MSL. ... if the value of PW is the maximum representable finite floating-point number and Trising is a positive number, then Twidth is obviously undefined, because you are adding something to a number which is by definition the largest possible one that can be represented in Modelica. In fact, trying to compute such a value should probably throw an overflow error, in the same way as computing 1/0 throws a division-by-zero error in all Modelica tools. Let me remind that IEEE 754 defines Inf and NaN and the associated semantics, but Modelica 3.6, which we must use for MSL 4.1.0, doesn't.

That is not correct reasoning according to IEEE-754; it depends on how much you add, in this case the addition is insignificant and doesn't impact the result.

However, if OpenModelica (not MAP-Lib) wants to change the OpenModelica version of ModelicaServices to work around a weird bug in the OMC compiler (instead of fixing it), that would be one thing, but I still don't see why changing the Spice models is needed at all.

As far as I previously understood the bug in the OpenModelica compiler is that if time<1.7e308 then ... else a/0 incorrectly generates a division by zero, when time has a normal value.

Discussing this overflow is a red herring - as previously indicated the problem occurred if Modelica.Constants.inf was above something like 0.8e308 - and adding a small finite value to that will surely not overflow.

Similarly asking why people would want to integrate that far is a distraction, as the problem occurs for normal times - it's just triggered by these large values in unknown ways.

DagBruck commented 6 days ago

I think the problem is identified, "solved". https://github.com/modelica/ModelicaStandardLibrary/issues/4478#issuecomment-2413355601

casella commented 6 days ago

That is not correct reasoning according to IEEE-754; it depends on how much you add, in this case the addition is insignificant and doesn't impact the result.

IEEE-754 is totally irrelevant here. The Modelica Specification, upon which the MSL builds, does not mention IEEE-754 semantics, except for external fuctions, which are not involved at all in this regression issue.

Let me summarize again my point: the current code of V_pulse and I_pulse in Spice3 is invalid, because it requires to add some positive offsets to a number which is defined as the maximum representable one. This is a bad idea, irrespective of tool implementation details. The fix is simply to use a much smaller number that perfectly serves the modelling purpose. It's as simple as that.

casella commented 6 days ago

I think the problem is identified, "solved". #4478 (comment)

That's what I thought, until I went through a more thorough analysis later on. I hope I am entitled to change my mind 😄

I added a comment to make it clearer, I can delete that post if you wish.

casella commented 6 days ago

As far as I previously understood the bug in the OpenModelica compiler is that if time<1.7e308 then ... else a/0 incorrectly generates a division by zero, when time has a normal value.

That's not correct, it also took me a while to figure out.

The problem is that the conditions time < T0 + Trising and time < T0 + Twidth in the conditional equations contain expressions with undefined values, so they are not computed correctly (I guess there's some overflow involved), causing the else clause to be computed when it actually shouldn't.

As I wrote, the problem is upstream: in Modelica (not in IEEE 754 or other irrelevant contexts) the values of T0 + Trising and T0 + Twidth is not well defined, because you are adding something to the maximum possible representable number. From that point onwards, everything can happen, undefined stuff leads to non-deterministic behaviour. IMHO this makes this model invalid, period.

The solution is to completely avoid this issue, by using an appropriate default value for PW and PER. No big deal. No need to change Modelica.Constants.inf for that. See PR #4481.

dzimmer commented 4 days ago

commented on it in #4479

casella commented 3 days ago

@dzimmer I agree with your comments in #4479 for the long term, regarding how we could handle infinity better than we do now, both in Modelica 3.7 and MSL 4.2.0, and beyond. I understand that is a far from trivial matter that woud require a bit more time for a proper discussion, involving all tool vendors, not just DS and OSMC.

At the moment, however, we need to address this single regression in the short time availble before the release of MSL 4.1.0, without unwanted side effects on other libraries (also beyond the MSL) and other tools.

In #4479 you pointed out that as of MLS 3.6 and after #4042, the V_pulse model has an undefined behaviour.

Verdict: The model with its default parameterization is invalid as soon as T0 > 0.

so it has no place in the MSL as currently formulated.

PR #4481 fixes this issue for MSL 4.1.0 locally in a simple, effective way, without any side effects for other models and tools. I don't think anybody can seriously argue that 1e20 (250 times the currently estimated age of the universe) is not an appropriate default value for an infinite time constant to be used for the periods and pulse durations of the V_pulse and I_pulse models. There's no need to be an expert in electronic circuit simulation to agree with that.

This would allow us to close this issue now and proceed happily with the MSL 4.1.0 release.

We can then reopen the discussion in #4479 with a broader scope, trying to figure out a better way to handle these issues in general, which will possibly require coordinated changes in the MSL (version >= 3.7) and MLS (version >= 4.2.0), as well as FMI and eFMI, to the maximum possible extent possible.

I don't think there is any reason to hasten the discussion in #4478, which has potentially a much broader scope than this specific issue, to the very close deadline of MSL 4.1.0 release, nor to delay the release of MSL 4.1.0 because of this extremely niche issue.

Do you agree with that?

dzimmer commented 2 days ago

Seems reasonable to me.

People take the MSL often as an example how to model stuff. This use of Modelica.Constants.inf is not a good example. So it is an improvment when we remove this.