p12tic / libsimdpp

Portable header-only C++ low level SIMD library
Boost Software License 1.0
1.24k stars 129 forks source link

SSE3 optimization of float32 reduce_add(), reduce_max(), and others not working #64

Closed peabody-korg closed 7 years ago

peabody-korg commented 7 years ago

given that SSE3 enabled implies SSE2 is also enabled, the sequence of #elif sections in the code:

elif SIMDPP_USE_SSE2

float32x4 sum2 = _mm_movehl_ps(a, a);
float32x4 sum = add(a, sum2);
sum = add(sum, permute2<1,0>(sum));
return _mm_cvtss_f32(sum);

elif SIMDPP_USE_SSE3

float32x4 b = a;
b = _mm_hadd_ps(b, b);
b = _mm_hadd_ps(b, b);
return _mm_cvtss_f32(b);

causes SSE2 code to be used even when SSE3 is available.

p12tic commented 7 years ago

Fixed in 731f670eecef8732e763df3193dcee7ef08670fa. Thanks!