intel / ARM_NEON_2_x86_SSE

The platform independent header allowing to compile any C/C++ code containing ARM NEON intrinsic functions for x86 target systems using SIMD up to AVX2 intrinsic functions
Other
430 stars 149 forks source link

Add optimized routines for pairwise long adds and _mm_mullo_epi32 #25

Closed easyaspi314 closed 5 years ago

easyaspi314 commented 5 years ago

vpaddlq_uN can be implemented as so:

{
    const __m128i ff = _mm_set1_epi2N((1 << N) - 1);
    __m128i low = _mm_and_si128(a, ff);
    __m128i high = _mm_srli_epi2N(a, N);
    return _mm_add_epi2N(low, high);
}

and the other unsigned pairwise adds are the same.

vpaddlq_s32 can be implemented like so:

{
    __m128i top, bot;
    bot = _mm_shuffle_epi32(a, _MM_SHUFFLE(0, 0, 2, 0));
    bot = _MM_CVTEPI32_EPI64(bot);
    top = _mm_shuffle_epi32(a, _MM_SHUFFLE(0, 0, 3, 1));
    top = _MM_CVTEPI32_EPI64(top);
    return _mm_add_epi64(top, bot);
}

And _mm_mullo_epi32 uses the same routine that GCC uses with vector extensions (Clang uses a similar method, but it uses pshufd which is slow on pre-Penryn chips):

{
    __m128i a_high = _mm_srli_epi64(a, 32);
    __m128i low = _mm_mul_epu32(a, b);
    __m128i b_high = _mm_srli_epi64(b, 32);
    __m128i high = _mm_mul_epu32(a_high, b_high);
    low = _mm_shuffle_epi32(low, _MM_SHUFFLE(0, 0, 2, 0));
    high = _mm_shuffle_epi32(high, _MM_SHUFFLE(0, 0, 2, 0));
    return _mm_unpacklo_epi32(low, high);
}
Zvictoria commented 5 years ago

Hi, easyaspi314. Thanks for your input!!!! it helps indeed. The only place that have to be changed is the _mm_set1_epi64x function - unfortunately it is not available say in VS compiler (for 32 bit version). So if you change it to another "set" - _epi32 with the corresponding arguments, I will merge your commit with a great pleasure. Thanks in advance!

easyaspi314 commented 5 years ago

I also added a version that uses _mm_blend_epi16 for SSE4.1 which only requires a pxor instead of movqda.

Zvictoria commented 5 years ago

The final thanks for this useful commit!