SSE2 implementation of i_to_float32(const float64x4& a) is broken

p12tic / libsimdpp

Portable header-only C++ low level SIMD library

Boost Software License 1.0

1.24k stars 129 forks source link

SSE2 implementation of i_to_float32(const float64x4& a) is broken #81

Closed peabody-korg closed 7 years ago

peabody-korg commented 7 years ago

the operation

r2 = move4_l<2>(r2);

shifts the wrong way and leaves r2 full of zeros. The upper 2 lanes of the result wind up being filled with zeros.

Furthermore, it looks like merging the two intermediate vectors with _mm_movelh_ps() might be more efficient than shifting and oring. Compiler optimization might produce that anyway, but _mm_movelh_ps() seems more straightforward.

peabody-korg commented 7 years ago

the incorrect shift appears to have been fixed in a7f191d5, however use of _mm_movelh_ps() might still make for more efficient code.

p12tic commented 7 years ago

Thanks, this improvement has been applied in 227cc0a8e79ceab4eda89126ca1e98b3ddc82c85.