Closed sergiopasra closed 7 years ago
@sergiopasra - just to check, does this depend on the optimization level you compile with?
@astrofrog Yes, with -O0 and -O1 it works, with -O2, -O3 it doesn't (only in i386, x86_64 works with both)
@sergiopasra - do you see the same issue when compiling SOFA? If so, we'll need to report it upstream.
@astrofrog I obtain the same if I change the compilation flags of sofa from "-pedantic -Wall -W -O" to "-pedantic -Wall -W -O2" I'm not sure that they are going to consider this a bug
How do we know this isn't a GCC6 compiler bug? How much does the value change between GCC5 and GCC6? Does reordering some expressions fix it? How much does the tolerance have to change to make the test pass (not very much)? What is the actual required precision of this calculation?
This also affects Debian: Bug 835105.
I played around a bit with the different optimization options: The source code in question is src/starpv.c
, and the optimization option is -fcaller-saves
.
Applying -O2 -fno-caller-saves
to gcc -c starpv.c
works well, -O1 -fcaller-saves
shows the failure.
So this looks like a GCC compiler bug then.
Agreed. I could extract the following code from starpv.c
:
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
double f(double x) {
return sqrt(1.0 - x) - 1.0;
}
double g(double x) {
double res = f(x);
printf("r =%.20g\n", res);
return res;
}
int main(void) {
double x = 3.1740879016271733482e-09;
double r1 = f(x);
double r2 = g(x);
printf("r1=%.20g r2=%.20g diff=%20g\n", r1, r2, r2-r1);
exit(0);
}
This code should print the same number for r1
and r2
, and a diff of 0. When compiled with gcc -O2 -o m m.c -lm
, I get the result
r =-1.5870439520936432953e-09
r1=-1.5870439407095204842e-09 r2=-1.5870439520936432953e-09 diff= -1.13841e-17
when I add -fno-caller-saves
, everything is fine. The numbers are taken from the erfa test case.
not really a gcc bug, i386 does not have reliable floating point math due to the 80 bit x87 fpu, if you need it try the -fexcess-precision=standard or -ffloat-store or -mfpmath=sse see https://gcc.gnu.org/wiki/x87note (or the super old gcc issue 323)
@juliantaylor When thinking about the numbers here, I don't think this is the issue: with double precision (64 bit), we have a mantissa of 52(+1) bit and a precision of ~16 digits. The two numbers in the test case however differ already in the 8th digit. Having 80 bit here, or converting 80 <-> 64 bit cannot explain this.
don't know what you are computing, but that they differ on the 8th digit does not necessarily imply that its not the excess precision that is to blame. Rounding differences can easily amplify . Also that fno-caller-saves "fixes" it also speaks for it, that flag basically causes a limited form of -ffloat-store (but possibly worse for performance as it also affects non float register allocations). Does -fexcess-precision=standard fix the issue?
The code is above, and I see nothing there that could amplify the error. From rounding 80->64 bit, I see no reason why it should be legitimate to have an error in the 8th digit; the error should be in the order of the precision. -fexcess-precision=standard
fixes the problem, however.
I don't think its obvious to say that from that code, there are several things in it that could amplify it, subtractions, divisions its got it all. An actual compiler mis-compilation that produces a 8th digit accuracy difference on the other hand seems less likely to me, in particular one that disabling caller saves fixes. In the end only a proper numerical analysis or looking at the dissambly can prove what is to blame, but for me the evidence that it is the well known x87 issue is pretty overwhelming while a mis-compilation is still just speculation. Then its not really worthwhile to dive into this deeper to see what it is, you should never do numerics on x87 fpus without proper compiler flags anyway (-mfp=sse is the best option for determinism and performance).
OK, agreed (and I am going to close the gcc bug here as well). However, I am not happy with just enabling these flags -- IMO they don't solve the underlying problem, but hide it: The function eraStarpv()
is quite inaccurate because of sqrt(1.0 - betr*betr - bett*bett) - 1.0
, if betr
and bett
are very small. In the test case of astropy, they are ~ 10^-4, which translates into an accuracy of only ~ 8 digits. Astropy however seems to need a higher accuracy in this loop when handling ephemerides.
IMO, there are two things is one thing needed/possible here:
betr
and bett
(use polynom instead of square root in this case, or similar). This should be discussed with upstream. Or at least the limited precision should be documented.eraStarpv()
is very high, and change the loop accordingly. I will copy my comments to astropy/astropy#5425 as well.The point here is IMO that the x87 FPU does not give wrong results, but just different ones. Both results are as expected within the accuracy, the 64-bit results are just more predictable. This gives us the chance to find exactly such problems, and therefore I would vote against just setting -mfp=sse
or such. Instead, i386 CI tests would be quite useful.
indeed that expression is poor for small values herbie recommends this replacement: http://herbie.uwplse.org/demo/6c625a8ebdbd46954b959c975040214b/graph.html
Funnily (but as expected), the i386 result seems to be the better one ;-)
Its not actually very surprising, the x87 fpu does have 80 bit precision for intermediate results, so it being better is not unusual. But its indeterminism is often outweighs this. If you have something that needs 80 bits you can use long doubles
at some performance cost on newer machines (and less portability as some arches don't have 80 bit long doubles).
Well, deterministically inaccurate results are not really what one wants...
you can easily construct cases where 80 bit is not enough either, its just shifting the problem a tiny bit. determinism is far more valuable.
An algorithm should give predictable results within the expected accuracy. If it does not, it is the algorithm which is to blame, not the "unpredictability" of the FPU. If we would have just enabled -mfp=sse
, we would not have detected the cause of the problem. Therefore, unpredicted results within the specified accuracy help to keep good algorithms, while -mfp=sse
just lets it look good but hides the real problem.
Just make sure that the algorithm is robust, then one doesn't need to care about additional unexpected precision.
I'm trying to compile erfa for new Fedora that comes with gcc 6. This is causing a problem in one of the tests. In particular I see this problem in i386 (in x86_64 the test passes)
eraFk52h failed: rv want -7.6000000940000251859 got -7.6000000939851357629 (1/5.1e+11) t_erfa_c validation failed!
The radial velocity is computed here
https://github.com/liberfa/erfa/blob/dc8292fd26a9cdfc504e210d912e52c3f73a9185/src/pvstar.c#L148