liberfa / erfa

Essential Routines for Fundamental Astronomy. Maintainers: @eteq @mhvk @sergiopasra
Other
133 stars 49 forks source link

Problem with gcc 6 and i386 #33

Closed sergiopasra closed 7 years ago

sergiopasra commented 8 years ago

I'm trying to compile erfa for new Fedora that comes with gcc 6. This is causing a problem in one of the tests. In particular I see this problem in i386 (in x86_64 the test passes)

eraFk52h failed: rv want -7.6000000940000251859 got -7.6000000939851357629 (1/5.1e+11) t_erfa_c validation failed!

The radial velocity is computed here

https://github.com/liberfa/erfa/blob/dc8292fd26a9cdfc504e210d912e52c3f73a9185/src/pvstar.c#L148

astrofrog commented 8 years ago

@sergiopasra - just to check, does this depend on the optimization level you compile with?

sergiopasra commented 8 years ago

@astrofrog Yes, with -O0 and -O1 it works, with -O2, -O3 it doesn't (only in i386, x86_64 works with both)

astrofrog commented 8 years ago

@sergiopasra - do you see the same issue when compiling SOFA? If so, we'll need to report it upstream.

sergiopasra commented 8 years ago

@astrofrog I obtain the same if I change the compilation flags of sofa from "-pedantic -Wall -W -O" to "-pedantic -Wall -W -O2" I'm not sure that they are going to consider this a bug

timj commented 8 years ago

How do we know this isn't a GCC6 compiler bug? How much does the value change between GCC5 and GCC6? Does reordering some expressions fix it? How much does the tolerance have to change to make the test pass (not very much)? What is the actual required precision of this calculation?

olebole commented 8 years ago

This also affects Debian: Bug 835105.

olebole commented 8 years ago

I played around a bit with the different optimization options: The source code in question is src/starpv.c, and the optimization option is -fcaller-saves. Applying -O2 -fno-caller-saves to gcc -c starpv.c works well, -O1 -fcaller-saves shows the failure.

timj commented 8 years ago

So this looks like a GCC compiler bug then.

olebole commented 8 years ago

Agreed. I could extract the following code from starpv.c:

#include <stdio.h>
#include <stdlib.h>
#include <math.h>

double f(double x) {
  return sqrt(1.0 - x) - 1.0;
}

double g(double x) {
  double res = f(x);
  printf("r =%.20g\n", res);
  return res;
}

int main(void) {
  double x = 3.1740879016271733482e-09;
  double r1 = f(x);
  double r2 = g(x);
  printf("r1=%.20g r2=%.20g diff=%20g\n", r1, r2, r2-r1);
  exit(0);
}  

This code should print the same number for r1 and r2, and a diff of 0. When compiled with gcc -O2 -o m m.c -lm, I get the result

r =-1.5870439520936432953e-09
r1=-1.5870439407095204842e-09 r2=-1.5870439520936432953e-09 diff=        -1.13841e-17

when I add -fno-caller-saves, everything is fine. The numbers are taken from the erfa test case.

olebole commented 8 years ago

FYI: I added a gcc bug for this issue.

juliantaylor commented 8 years ago

not really a gcc bug, i386 does not have reliable floating point math due to the 80 bit x87 fpu, if you need it try the -fexcess-precision=standard or -ffloat-store or -mfpmath=sse see https://gcc.gnu.org/wiki/x87note (or the super old gcc issue 323)

olebole commented 7 years ago

@juliantaylor When thinking about the numbers here, I don't think this is the issue: with double precision (64 bit), we have a mantissa of 52(+1) bit and a precision of ~16 digits. The two numbers in the test case however differ already in the 8th digit. Having 80 bit here, or converting 80 <-> 64 bit cannot explain this.

juliantaylor commented 7 years ago

don't know what you are computing, but that they differ on the 8th digit does not necessarily imply that its not the excess precision that is to blame. Rounding differences can easily amplify . Also that fno-caller-saves "fixes" it also speaks for it, that flag basically causes a limited form of -ffloat-store (but possibly worse for performance as it also affects non float register allocations). Does -fexcess-precision=standard fix the issue?

olebole commented 7 years ago

The code is above, and I see nothing there that could amplify the error. From rounding 80->64 bit, I see no reason why it should be legitimate to have an error in the 8th digit; the error should be in the order of the precision. -fexcess-precision=standard fixes the problem, however.

juliantaylor commented 7 years ago

I don't think its obvious to say that from that code, there are several things in it that could amplify it, subtractions, divisions its got it all. An actual compiler mis-compilation that produces a 8th digit accuracy difference on the other hand seems less likely to me, in particular one that disabling caller saves fixes. In the end only a proper numerical analysis or looking at the dissambly can prove what is to blame, but for me the evidence that it is the well known x87 issue is pretty overwhelming while a mis-compilation is still just speculation. Then its not really worthwhile to dive into this deeper to see what it is, you should never do numerics on x87 fpus without proper compiler flags anyway (-mfp=sse is the best option for determinism and performance).

olebole commented 7 years ago

OK, agreed (and I am going to close the gcc bug here as well). However, I am not happy with just enabling these flags -- IMO they don't solve the underlying problem, but hide it: The function eraStarpv() is quite inaccurate because of sqrt(1.0 - betr*betr - bett*bett) - 1.0, if betr and bett are very small. In the test case of astropy, they are ~ 10^-4, which translates into an accuracy of only ~ 8 digits. Astropy however seems to need a higher accuracy in this loop when handling ephemerides. IMO, there are two things is one thing needed/possible here:

The point here is IMO that the x87 FPU does not give wrong results, but just different ones. Both results are as expected within the accuracy, the 64-bit results are just more predictable. This gives us the chance to find exactly such problems, and therefore I would vote against just setting -mfp=sse or such. Instead, i386 CI tests would be quite useful.

juliantaylor commented 7 years ago

indeed that expression is poor for small values herbie recommends this replacement: http://herbie.uwplse.org/demo/6c625a8ebdbd46954b959c975040214b/graph.html

olebole commented 7 years ago

Funnily (but as expected), the i386 result seems to be the better one ;-)

juliantaylor commented 7 years ago

Its not actually very surprising, the x87 fpu does have 80 bit precision for intermediate results, so it being better is not unusual. But its indeterminism is often outweighs this. If you have something that needs 80 bits you can use long doubles at some performance cost on newer machines (and less portability as some arches don't have 80 bit long doubles).

olebole commented 7 years ago

Well, deterministically inaccurate results are not really what one wants...

juliantaylor commented 7 years ago

you can easily construct cases where 80 bit is not enough either, its just shifting the problem a tiny bit. determinism is far more valuable.

olebole commented 7 years ago

An algorithm should give predictable results within the expected accuracy. If it does not, it is the algorithm which is to blame, not the "unpredictability" of the FPU. If we would have just enabled -mfp=sse, we would not have detected the cause of the problem. Therefore, unpredicted results within the specified accuracy help to keep good algorithms, while -mfp=sse just lets it look good but hides the real problem. Just make sure that the algorithm is robust, then one doesn't need to care about additional unexpected precision.