google / benchmark

A microbenchmark support library
Apache License 2.0
8.94k stars 1.62k forks source link

bug and potential fix for using long double with DoNotOptimize and gcc #903

Open KlausLeppkes opened 4 years ago

KlausLeppkes commented 4 years ago

Hi,

1.) I discovered that g++ has a problem with +m inside DoNotOptimize. As Jakub Jelinek stated in my gcc bugreport, +g works instead of +m (see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92597). 2.) I attached the small example (test.cpp+makefile), which gives wrong values using g++[9.2] with +m. Also, if you define -DFIX inside the makefile, gcc is working fine. Clang and Intel seem to not having this problem, but are also checked.

For reproduction, g++ is called to compile with +m: $>make ... FMA: 0; TYPE: double; ASM Volatile: 1 => y=6.9529e-305 FMA: 1; TYPE: long double; ASM Volatile: 1 => y=-nan make: *** [makefile:17: diff_gcc] Error 1

If you uncomment the -DFIX inside the CFLAGS, gcc produces the correct numbers.

I don't know if this fixes the problem entirely, but I wanted to let you know that there is a problem and a workaround for gcc.

Cheers, Klaus

googlebench_bugreport.zip

dmah42 commented 4 years ago

@KlausLeppkes thanks for the comprehensive bug report!

@EricWF this is your area of the code and you know it best so over to you :)

KlausLeppkes commented 4 years ago

I searched for some doc and found: "g" : Any register, memory or immediate integer operand is allowed, except for registers that are not general registers." (x86 specified!). vs. "m" : A memory operand is allowed, with any kind of address that the machine supports in general. source: https://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly-HOWTO.html

@EricWF: The helper functions are for x86 anyway, so why not changing m into g everywhere?

dmah42 commented 3 years ago

closed the PR but i'd like to understand if there's something that we need to do here after all.

MatejKafka commented 6 months ago

GCC maintainers are pretty adamant that this is not a compiler bug, and that the "+m,r" is incorrect. Not that it is documented anywhere, so we're gonna have to take their word for it. :)

The issue is that the in/out operand is internally split into 2 separate operands: __asm__ __volatile__("" : "=m,r"(value) : "m,0"(value) : "memory");. When DoNotOptimize is used to obfuscate a fixed input to a benchmark, this results in value never being initialized:

extern int external_fn(int);
extern int arg;

int fn() {
    arg = 5;
    asm volatile("" : "+m,r"(arg) : : "memory"); // inlined content of `DoNotOptimize`
    return external_fn(arg);
}

The inline asm statement in this snippet is turned into __asm__ __volatile__("" : "=m,r"(arg) : "m,0"(5) : "memory");, which results in gcc placing the constant 5 into memory, passing the address as the input argument and expecting the asm block to store the result into arg, resulting in calling external_fn with an uninitialized argument.

See the following for a repro: https://godbolt.org/z/W7GYMcann