Closed peabody-korg closed 7 years ago
given that SSE3 enabled implies SSE2 is also enabled, the sequence of #elif sections in the code:
float32x4 sum2 = _mm_movehl_ps(a, a); float32x4 sum = add(a, sum2); sum = add(sum, permute2<1,0>(sum)); return _mm_cvtss_f32(sum);
float32x4 b = a; b = _mm_hadd_ps(b, b); b = _mm_hadd_ps(b, b); return _mm_cvtss_f32(b);
causes SSE2 code to be used even when SSE3 is available.
Fixed in 731f670eecef8732e763df3193dcee7ef08670fa. Thanks!
given that SSE3 enabled implies SSE2 is also enabled, the sequence of #elif sections in the code:
elif SIMDPP_USE_SSE2
elif SIMDPP_USE_SSE3
causes SSE2 code to be used even when SSE3 is available.