we should compile with -fno-math-errno for more efficient code (not sure if this only affects gcc, or also clang?). Obviously as a private compilation flag for libNCrystal only.
Specifically I discovered the need for this when I was trying to get simd auto-vectorisation to work on a large number of neutrons, and no matter when I simply could not get a simple loop like the following to vectorise:
for ( unsigned i = 0; i < n; ++i)
v[i] = std::sqrt(x[i])
The problem is that the errno handling adds a branch inside the sqrt implementation. Using __builtin_sqrt or sqrt(fabs(x[i])) didn't work either. However compiling with -fno-math-errno did the trick.
As explained here:
https://stackoverflow.com/questions/57673825/how-to-force-gcc-to-assume-that-a-floating-point-expression-is-non-negative/57674631#57674631
we should compile with
-fno-math-errno
for more efficient code (not sure if this only affects gcc, or also clang?). Obviously as a private compilation flag for libNCrystal only.Specifically I discovered the need for this when I was trying to get simd auto-vectorisation to work on a large number of neutrons, and no matter when I simply could not get a simple loop like the following to vectorise:
The problem is that the errno handling adds a branch inside the sqrt implementation. Using
__builtin_sqrt
orsqrt(fabs(x[i]))
didn't work either. However compiling with-fno-math-errno
did the trick.