p12tic / libsimdpp

Portable header-only C++ low level SIMD library
Boost Software License 1.0
1.24k stars 129 forks source link

Feature request: Add compile-time blend masks #130

Open eriksjolund opened 5 years ago

eriksjolund commented 5 years ago

The intrinsics blend functions

have different performance characteristics. Among them the function _mm256_blend_epi32() is the fastest but its mask needs to be encoded into an const int imm8 at compile-time. That hinders its use in the blend implementation of the current libsimdpp if I understand correctly (see also https://github.com/p12tic/libsimdpp/issues/56)

For masks that are already known at compile-time, I think it would be good to represent them in a new fashion. For instance the blend mask could be represented as a tuple from the library boost::hana

    auto mask = hana::make_tuple(
      hana::true_c,  hana::true_c,
      hana::true_c,  hana::true_c, 
      hana::false_c, hana::false_c, 
      hana::false_c, hana::false_c, 

      hana::false_c, hana::false_c, 
      hana::false_c, hana::false_c, 
      hana::true_c,  hana::true_c, 
      hana::true_c,  hana::true_c, 

      hana::false_c, hana::false_c, 
      hana::false_c, hana::false_c, 
      hana::false_c, hana::false_c, 
      hana::false_c, hana::false_c, 

      hana::false_c, hana::false_c, 
      hana::false_c, hana::false_c, 
      hana::false_c, hana::false_c, 
      hana::false_c, hana::false_c
    );

The immediate mask for _mm256_blend_epi32() could then be computed at compile-time. I made an proof-of-concept implementation of this in

https://github.com/eriksjolund/compile-time-simd-blend-mask