Feature request: Add compile-time blend masks

The intrinsics blend functions

__m256i _mm256_blendv_epi8(m256i v1, m256i v2, __m256i mask)
__m256i _mm256_blend_epi16(m256i a, m256i b, const int imm8)
__m256i _mm256_blend_epi32(m256i a, m256i b, const int imm8)

have different performance characteristics. Among them the function _mm256_blend_epi32() is the fastest but its mask needs to be encoded into an const int imm8 at compile-time. That hinders its use in the blend implementation of the current libsimdpp if I understand correctly (see also https://github.com/p12tic/libsimdpp/issues/56)

For masks that are already known at compile-time, I think it would be good to represent them in a new fashion. For instance the blend mask could be represented as a tuple from the library boost::hana

    auto mask = hana::make_tuple(
      hana::true_c,  hana::true_c,
      hana::true_c,  hana::true_c, 
      hana::false_c, hana::false_c, 
      hana::false_c, hana::false_c, 

      hana::false_c, hana::false_c, 
      hana::false_c, hana::false_c, 
      hana::true_c,  hana::true_c, 
      hana::true_c,  hana::true_c, 

      hana::false_c, hana::false_c, 
      hana::false_c, hana::false_c, 
      hana::false_c, hana::false_c, 
      hana::false_c, hana::false_c, 

      hana::false_c, hana::false_c, 
      hana::false_c, hana::false_c, 
      hana::false_c, hana::false_c, 
      hana::false_c, hana::false_c
    );

The immediate mask for _mm256_blend_epi32() could then be computed at compile-time. I made an proof-of-concept implementation of this in

https://github.com/eriksjolund/compile-time-simd-blend-mask

p12tic / libsimdpp

Feature request: Add compile-time blend masks #130