p12tic / libsimdpp

Portable header-only C++ low level SIMD library
Boost Software License 1.0
1.24k stars 129 forks source link

simdpp::blend() with mask_int32<8> gets compiled into PBLENDVB (not VPBLENDD) #56

Closed eriksjolund closed 7 years ago

eriksjolund commented 7 years ago

I believe the instruction VPBLENDD is faster than the instruction PBLENDVB. The first instruction only handles dwords but the second instruction handles bytes.

For that reason I thought that a simdpp::blend() that makes use of a mask_int32<8> would be compiled into a VPBLENDD instead of a PBLENDVB.

My test program gets compiled into PBLENDVB:

    #include <iostream>
    #include <simdpp/simd.h>

    int main() {
      simdpp::uint32<8> v1 = simdpp::make_uint(std::numeric_limits< uint32_t >::max(), 0);
      simdpp::uint32<8> v2 = simdpp::make_uint(std::numeric_limits< uint32_t >::max());
      const auto mask = simdpp::cmp_eq(v1, v2);
      v1 = simdpp::blend(v1, v2, mask);
      // Just output something so that the compiler does not optimize away everything
      std::cout << simdpp::reduce_max(v1) << "\n";
    }
    $ g++-7.1 -I/home/user/libsimdpp/inst/include/libsimdpp-2.0 -I. -std=c++14  -msse4.1  -mavx2 -O3  -D SIMDPP_ARCH_X86_AVX2 -save-temps /home/user/test.cc
    $ grep blend test.s
        vpblendvb   %ymm1, %ymm1, %ymm0, %ymm1

Do you know why PBLENDVB is being used and not VPBLENDD?

p12tic commented 7 years ago

Hi, thanks for the bug report.

VPBLENDD operates only on immediate masks that are encoded into the instruction, much like VPSHUFD or VSHUFPS perform shuffling. It's not possible perform blending using that instruction when the mask is encoded into a register.