llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.71k stars 11.88k forks source link

[clang] Complex multiplications are not correctly rounded. #113272

Open Sh0g0-1758 opened 6 days ago

Sh0g0-1758 commented 6 days ago

Refer this report : https://inria.hal.science/hal-04714173, which hints towars the issue in CMPLX FP multiplication.

This can be observed in the following example in which I compare the results from GNU MPC (infinite precision) with the results from clang (trunk) and gcc (trunk) for 32 bit precision.

a (GNU MPC)         = 0x1.9387bep+0 + 0x1.485eb4p+1*i
a (clang and gcc)   = 0x1.9387bep+0 + 0x1.485eb4p+1*i
b (GNU MPC)         = -0x1.8bee4ep+1 + 0x1.039aep+2*i
b (clang and gcc)   = -0x1.8bee4ep+1 + 0x1.039aep+2*i

(a * b) (GNU MPC)   = -0x1.e904f'e'p+3 + -0x1.8a95'4e'p+0*i
(a * b) (gcc -O3)   = -0x1.e904f'e'p+3 + -0x1.8a95'58'p+0*i
(a * b) (clang -O3) = -0x1.e904f'c'p+3 + -0x1.8a95'58'p+0*i
(a * b) (clang):    = -0x1.e904f'c'p+3 + -0x1.8a95'58'p+0*i
(a * b) (gcc):      = -0x1.e904f'c'p+3 + -0x1.8a95'58'p+0*i
efriedma-quic commented 6 days ago

Note that on Linux targets, clang uses libgcc by default, so no LLVM code is actually involved in the computation. We do use the compiler-rt implementation on some targets, though (compiler-rt/lib/builtins/mulsc3.c).

I don't know of a fast algorithm that's correctly rounded... do you know about any research in that direction? I'm not sure we can reasonably do anything here without that. (MPC can do infinite-precision arithmetic, but that's quite slow relative to using native hardware FP.)

CC @lntue

lntue commented 5 days ago

Note that on Linux targets, clang uses libgcc by default, so no LLVM code is actually involved in the computation. We do use the compiler-rt implementation on some targets, though (compiler-rt/lib/builtins/mulsc3.c).

I don't know of a fast algorithm that's correctly rounded... do you know about any research in that direction? I'm not sure we can reasonably do anything here without that. (MPC can do infinite-precision arithmetic, but that's quite slow relative to using native hardware FP.)

CC @lntue

I have some ideas, so we plan to try to implement them in LLVM libc and compare the performance. If it works out ok, we will see how to port that back.