Different language models for float computation

It's worth noting the languages produce wildly divergent results:

language	version	time (rounded)	accuracy
C (gcc)	11.2.1	5.210	0.3888888888888889
Rust	1.60.0	6.121	0.7222222222222222
C++ (g++)	11.2.1	8.621	0.3888888888888889
Go	1.19.1	12.789	0.7222222222222222
Crystal	1.4.1	45.557	0.6666666666666666
PHP	8.1.11	70.694	0.7222222222222222
Nim	1.6.6	164.390	0.6666666666666666
Javascript (nodejs)	18.9.1	185.484	0.7222222222222222
Java	19.36	196.741	0.7222222222222222
Python (PyPy)	3.9.12	197.456	0.7222222222222222
Julia	1.6.7	733.121	0.6666666666666666
Swift	5.7	948.922	0.6666666666666666
Ruby	3.1.2	1137.632	0.6666666666666666
Python (CPython)	3.10.5	1499.846	0.7222222222222222
R	4.2.0	1536.440	0.6666666666666666
Elixir	1.13.4	3118.417	0.6111111111111112

One might incorrectly assume they all should produce roughly the same result, but that would only hold if they all use the same floating point implementations. Some divergence might still be expected for those which do. But the C and C++ results are wildly different.

Part of that is due to currently using a float (or "binary32"), which is less accurate. However, even resizing the use of a double as in other languages can produce a different result, in my experiments, so I don't think that's the entire explanation. Part of that is due to numeric promotion implicit in the C language, I expect.

Another reason that might be why is that gcc's default compilation settings allow it to do various implicit """optimizations""" that are illegal under most language semantics, because they can make certain calculations more precise by reducing the number of roundoffs, but make other calculations fail by introducing inaccuracy instead (e.g. making 0.0 == (y * x + b) - (y * x + b) fail). One of them in gcc is toggled by -ffp-contract, and defaults to =fast.

For that one, at least, it might be intentional to use something like that in this case, as certainly it seems to make the Leibniz computation go... faster, I guess. If so, then all the languages that have an fma or fmaf function should probably be optionally using it. An FPU with "fused multiply-add", which most do nowadays, will actually perform multiplication as something roughly equivalent to fma(a, b, 0.0), with addition as fma(1.0, b, c), where fma(a, b, c) is a * b + c but in "one step" (a single rounding).

niklas-heer / speed-comparison

Different language models for float computation #26