Open Krasjet opened 3 years ago
This is strange. After I replace every instance of DoNotOptimize
with escape
, flush_32
always runs around 0.3ns slower no matter how many times I rerun it, even though it is exactly the same as escape_32
#include <benchmark/benchmark.h>
#include <cmath>
#include <cfloat>
#define FLUSH(x) ((x) = fabs(x)<DBL_MIN ? 0 : (x))
#define FLUSHF(x) ((x) = fabsf(x)<FLT_MIN ? 0 : (x))
template <class Tp>
inline void escape(Tp& value)
{
asm volatile("" :: "g"(value) : "memory");
}
static void
flush_64(benchmark::State& state)
{
double mem = FLT_MIN;
for (auto _ : state) {
escape(mem = 0.999 * mem);
escape(FLUSH(mem));
}
}
BENCHMARK(flush_64);
static void
flush_32(benchmark::State& state)
{
float mem = FLT_MIN;
for (auto _ : state) {
escape(mem = 0.999f * mem);
escape(FLUSHF(mem));
}
}
BENCHMARK(flush_32);
static void
escape_32(benchmark::State& state)
{
float mem = FLT_MIN;
for (auto _ : state) {
escape(mem = 0.999f * mem);
escape(FLUSHF(mem));
}
}
BENCHMARK(escape_32);
BENCHMARK_MAIN();
$ g++ -O3 bug.cc -lbenchmark
$ ./a.out
-----------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------
flush_64 0.693 ns 0.693 ns 858596482
flush_32 1.01 ns 1.01 ns 691820352
escape_32 0.689 ns 0.688 ns 1000000000
While if we set the constraint to "r"
instead, which is used by folly,
template <class Tp>
inline void escape(Tp& value)
{
asm volatile("" :: "r"(value));
}
the execution time would be the same
$ g++ -O3 bug.cc -lbenchmark
$ ./a.out
-----------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------
flush_64 0.678 ns 0.677 ns 884857534
flush_32 0.678 ns 0.677 ns 1000000000
escape_32 0.678 ns 0.677 ns 994171725
Describe the bug DoNotOptimize seems to have unpredictable behavior on ternary conditionals.
I'm trying to benchmark the performance of manual flushing to zero on denormals and I define a macro like this:
which basically flushes any denormal floats (i.e. below
FLT_MIN
) to 0 to prevent performance degradation caused by denormal numbers on x86.This macro is used in the following benchmark:
When compiling using
g++
with-O3
optimization,FLUSHF
doesn't seem to be working correctly and the flushing does not happen, which results in slower execution.To see this in action, try compile the following with
-O2
,-O3
, andclang++
and see the performance difference (you might need an x86 machine)Because
FLUSHF
is partially optimized away forgcc
with-O3
, it runs much slower.Strangely, the flushing works fine with
double
. I also tried the originalescape
function from the video mentioned in the comment.and it works correctly. Try compile the following to see the problem:
The anomaly of
flush_32
withgcc -O3
is apparently a problem.Is there a way to fix this problem without manually introducing another non-portable function to escape the optimization?
System Which OS, compiler, and compiler version are you using:
Expected behavior DoNotOptimize works predictably on ternary conditionals.