modelica-tools / csv-compare

Tool to compare curves from one csv files with curves from other csv files using an adjustable tolerance
https://www.modelica.org
BSD 3-Clause "New" or "Revised" License
26 stars 15 forks source link

Set a reasonable absolute tolerance to avoid spurious negatives for close-to-zero signals #66

Closed casella closed 2 months ago

casella commented 4 months ago

We had many false regressions in the MSL on signals that are supposed to be zero but aren't, due to numerical errors, see modelica/ModelicaStandardLibrary#4421

One solution is to remove those signals outright, but then you can't really make sure that they are close to zero, so that's not a good solution in general, unless those close-to-zero signals are somehow redundant.

@HansOlsson in that ticket suggests a pragmatic but actually quite effective fix for this problem: set the tube width not only based on a relative tolerance, but also on some a-priori nominal value for signals, e.g. 1e-3. This means that the error criterion should be something like:

(v - v_ref)/max(abs(v), nom) < tol

with nom = 1e-3 and tol = 0.002, the current setting for MSL testing.

@beutlich could you please take care of that at your earliest convenience?

Keeping @GallLeo in the loop.

beutlich commented 4 months ago

Can you please add the two two-column CSV files where the behavior can be reproduced.

casella commented 3 months ago

@beutlich you can find many examples in PR modelica/ModelicaStandardLibrary#4420, where I first wanted to actually remove the offending signals from the CSV files. The idea is that this should not be necessary (i.e. the amended CSV-compare tool should pass the verification without removing those signals) with the change proposed in this ticket.

casella commented 3 months ago

This is a MWE of two CSV files that are expected to fail on v2 with the current CSV-compare tool and should succeed instead with the proposed change, nom = 1e-3, and tol = 0.002 , if I am not mistaken.

CSV1:

"time", "v1", "v2"
0.0, 0.0, 1e-8 
1.0, 1.0, -2e-8
2.0, 1.5, 1.5e-9
3.0, 1.0, 5e-9
4.0, 0.0, -1e-8

CSV2:

"time", "v1", "v2"
0.0, 0.0002, 3e-8 
1.0, 1.0004, 4e-8
2.0, 1.5008, 2.5e-9
3.0, 1.0002, 1e-9
4.0, 0.0, 2e-8

Thanks!

bilderbuchi commented 3 months ago

set the tube width not only based on a relative tolerance, but also on some a-priori nominal value for signals, e.g. 1e-3. This means that the error criterion should be something like:

(v - v_ref)/max(abs(v), nom) < tol

with nom = 1e-3 and tol = 0.002, the current setting for MSL testing.

You saying "should be something like" implies to me that you're not totally sure about the specifics, yourself.

To avoid coming up with some tolerance definition off-the-cuff (that maybe subtly differs from other similar functions, please let me suggest to use the one that Python's math.isclose uses:

abs(v - v_ref) <= max(tol*max(abs(v), abs(v_ref)), nom)

This is the same relation that Julia's isapprox uses.

NB: Numpy uses a slightly different formulation, which I personally don't like because it's not symmetric in its arguments, as tol is scaled with only one of them:

abs(v - v_ref) <= nom + tol*abs(v_ref)
casella commented 3 months ago

You saying "should be something like" implies to me that you're not totally sure about the specifics, yourself.

I meant I didn't put too much effort in making a precise proposal, because of lack of time 😅

In fact, what I wanted to write was

abs(v - v_ref)/max(abs(v), nom) < tol

(of course you need the absolute value on the LHS). This is equivalent to

abs(v - v_ref) < tol*max(abs(v), nom)
abs(v - v_ref) < max(tol*abs(v), tol*nom)

it seems reasonable to me to take the max between abs(v) and abs(v_ref), as sugggested by @bilderbuchi . I'm not sure if that formula is sensible regarding nom, I guess it should be

abs(v - v_ref) <= tol*max(max(abs(v), abs(v_ref)), nom)

as asking for

abs(v - v_ref) <= nom

seems a bit too lax to me.

Ditto for the Numpy implementation, I guess it should be

abs(v - v_ref) <= tol*nom + tol*abs(v_ref)

The idea is that the absolute value of the error should be less than the tolerance times the largest of three values: the variable itself, its reference value, and its nominal value, in order to avoid being too strict in cases where the variable is accidentally close to zero.

@bilderbuchi are we on the same page?

casella commented 3 months ago

Of course this is based on the assumption that nom is the nominal value of variable v, i.e., its order of magnitude.

bilderbuchi commented 3 months ago

The idea is that the absolute value of the error should be less than the tolerance times the largest of three values: the variable itself, its reference value, and its nominal value, in order to avoid being too strict in cases where the variable is accidentally close to zero.

Yes, you are of course right. I made a sloppy mistake in "translating" the Python docs examples to your nomenclature above, in that I simply replaced atol (absolute tolerance) by nom. I didn't think that through, the replacement should be tol*nom in our situation, as you rightly point out.

casella commented 3 months ago

@beutlich do you have a clear idea about how to proceed now?

beutlich commented 3 months ago

Not sure if I found the right place to consider nominal values, but this quick-and-dirty change could work:

--- a/Modelica_ResultCompare/CurveCompare/TubeSize.cs
+++ b/Modelica_ResultCompare/CurveCompare/TubeSize.cs
@@ -126,7 +126,7 @@ namespace CurveCompare
         {
             double epsilon = 1e-12;
             baseX = Math.Max(Math.Max(reference.X.Max() - reference.X.Min(), Math.Abs(reference.X.Min())), epsilon);
-            ratio = Math.Max(Math.Max(reference.Y.Max() - reference.Y.Min(), Math.Abs(reference.Y.Min())), epsilon) / baseX;
+            ratio = Math.Max(Math.Max(Math.Max(reference.Y.Max() - reference.Y.Min(), Math.Abs(reference.Y.Min())), epsilon) / baseX, 0.001);
             baseY = baseX * ratio;
             return;
         }
casella commented 3 months ago

@beutlich I'm trying to figure out the code you pointed out

https://github.com/modelica-tools/csv-compare/blob/18b309dcd0b510abd2a911ed6fcd82245a80aa32/Modelica_ResultCompare/CurveCompare/TubeSize.cs#L99-L132

but I'm not familiar with it. The first question is: what is the difference between SetStandardBaseAndRatio() and SetFormerBaseAndRatio() ? What do "standard values" and "former standard values" actually mean? Which of the two functions is actually used?

If SetFormerBaseAndRatio() is used, as you suggest, then I reckon we should just set epsilon to 0.001 instead of 1e-12 and keep the rest of the code as it is. This would mean that when the maximum between the range of the reference and its minimum value is a very small number (i.e., numerical noise around zero), we consider a default amplitude of 0.001 for all subsequent elaborations.

Maybe we should also adapt SetStandarBaseAndRatio(), just in case?

beutlich commented 3 months ago

I am neither sure what "former" and "standard" refers to. When debugging, I noticed that the "former" function is called.

bilderbuchi commented 3 months ago

I'm confused about this part of the code, it seems to basically set the extent (BaseX, BaseY) and aspect ratio of the whole reference curve in question, and seems not to be involved with the tolerance computation at all? reference.X is an array of x values, which Min() and Max() presumably find the highest and lowest value of. So, I'm not sure if this is the right place to edit?

"Former" seems to refer to some formerly used method of computation: https://github.com/modelica-tools/csv-compare/blob/18b309dcd0b510abd2a911ed6fcd82245a80aa32/Modelica_ResultCompare/CurveCompare/TubeSize.cs#L88

bilderbuchi commented 3 months ago

The point where the tolerance should come into play is at https://github.com/modelica-tools/csv-compare/blob/18b309dcd0b510abd2a911ed6fcd82245a80aa32/Modelica_ResultCompare/CurveCompare/Algorithms/Rectangle.cs#L52-L53 where the lower and upper extent of the tube around the reference are computed. (or the equivalent function in the Ellipse.cs file, I could not find out which one is ultimately used).

The confusing thing is that in that case the whole extent of the curve values would be used for creating the tube, which would be meaningless. Unfortunately, the code structure is full of indirections and similarly named variables, so it's very hard to find out more just using the Github code viewer. It could be possible that here, we land in the Options1 branch and compute the tube size not in the standard way (with the above curve extents), but with some arbitrary Value, which I think then would end up being hardcoded to 0.002. AFAICT, this would then put a tube of "width" (in some sense) 0.002 around the reference (irrespective of absolute, relative tolerance of the nearest reference point).

You'd have to analyze the program behaviour and results to find out if that is the case, and ideally add some tests with some constructed expected-pass and expected-fail curves to ensure the expected behaviour. AFAICT, this program does not have a test suite, yet.

That's about as far as I can help, here -- I hope it does!

casella commented 3 months ago

I have no idea how this code actually works in detail, but to me the case is quite clear.

The "former" code, which according to @beutlich is the one that is actually executed, computes baseY as

Math.Max(Math.Max(reference.Y.Max() - reference.Y.Min(), Math.Abs(reference.Y.Min())), epsilon)

I'm not sure how exactly this baseY value is used, but it is clear that it ultimately determines the width and/or height of the tolerance tube based on the range of Y values of the reference file.

Here, the range is (most reasonably) defined as max(y_max - y_min, y_min); this ensures that if you have a signal with a wide span of y values, the span is taken as baseY, while in the case the signal is (nearly) constant, you take the absolute value of the minimum value. The corner case when the signal is constant zero was handled by adding a third max was with a small epsilon value, to avoid baseY being zero.

So far so good, except that this results into too tight tubes when we have variables that are near zero because of balance equations, but are supposed to have a range which is much larger than the actual span of values in the result file.

My argument (Ockham's razor rules!) is that we don't actually need to get into the details of how the tube algorithm works. We only need to make sure that baseY, i.e. the range of y values of the reference, is not excessively small for reference values that happen to be nearly zero (e.g. because of balance equations), but are supposed to have a much wider a-priori range. So, we should take baseY to be the max between the value as computed currently (which is OK in all cases except near-zero variables), and the nominal value.

Since we don't have nominal attribute in the CSV file, @HansOlsson pragmatic proposal, which I fully endorse, is to take a value of 0.001, which is small enough to avoid most of the false negatives we get now. The only drawback is that there could be some reference values with nominal values much smaller than this, in which case wrong values could be accepted. So be it, that's much better than removing the near-zero references altogether from the reference files, which would lead to accepting a lot more errors, potentially.

I just created PR #67 based on these considerations.

casella commented 3 months ago

BTW, if SetStandardBaseAndRatio() is actually not used and is thus dead code, we should probably remove it, or at least write that in the comment.

casella commented 2 months ago

Discussion during the MAP-LIB meeting Sep 10: go ahead with #67, get @GallLeo to compile it and use it for the new regression testing, and keep MAP-LIB in the loop.