romeric / fastapprox

Approximate and vectorized versions of common mathematical functions
194 stars 35 forks source link

Etheory patch gcc narrowing fix + MSVC support #3

Closed etheory closed 7 years ago

etheory commented 7 years ago

Fixes issue for more strict compilation in GCC (also works in Clang). Tested in a VFX production environment on Linux.

The previous definition of v4sil was trying to cast to a 64bit type, not not exactly matching the expected underlying type (long long).

Under c++11, this fails to compile with a narrowing warning. i.e. in gcc when using the flags "-std=c++11 -Wnarrowing"

By casting to a "long long" instead of "unsigned long long" above, we match the type of the underlying definition, and also the v2dil cast then work correctly to widen the 2 64bit type to the expected final 128bit type.

@pmineiro @romeric

P.S. to verify go to godbolt, and paste the following:

#include <stdint.h>
#include <emmintrin.h>
#include <smmintrin.h>
#include <cmath>

#define __forceinline __attribute__((always_inline))

typedef __m128 v4sf;
typedef __m128i v4si;

#define v4si_to_v4sf _mm_cvtepi32_ps
#define v4sf_to_v4si _mm_cvttps_epi32

#define v4sfl(x) ((const v4sf) { (x), (x), (x), (x) })

#define v2dil(x) ((const v4si) { (x), (x) })
#define v4sil(x) v2dil((((long long) (x)) << 32) | (long long) (x))

v4si test()
{
  v4si result = v4sil(0x807FFFFF);
  return result;
}

int main()
{
  volatile v4si val = test();
  return 0;
}

Then compile with a gcc target, with the following flags:

-O3 -ffast-math -std=c++11 -Wnarrowing

Note that without this change, it won't compile.

pmineiro commented 7 years ago

Nice!

romeric commented 7 years ago

@etheory Good job. This should address #1