We already expand basic truncation intrinsics to the trunc+shuffle sequence:
```cpp
static __inline__ __m128i __DEFAULT_FN_ATTRS128
_mm_cvtepi64_epi8 (__m128i __A)
{
return (__m128i)__builtin_shufflevector(
__builtin_convertvector((__v2di)__A, __v2qi), (__v2qi){0, 0}, 0, 1, 2, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3);
}
```
But the predicate variants are not, as its tricky to correctly merge the predicate select of the lower elements and the zeroing of the upper elements:
```
static __inline__ __m128i __DEFAULT_FN_ATTRS128
_mm_mask_cvtepi64_epi8 (__m128i __O, __mmask8 __M, __m128i __A)
{
return (__m128i) __builtin_ia32_pmovqb128_mask ((__v2di) __A,
(__v16qi) __O, __M);
}
static __inline__ __m128i __DEFAULT_FN_ATTRS128
_mm_maskz_cvtepi64_epi8 (__mmask8 __M, __m128i __A)
{
return (__m128i) __builtin_ia32_pmovqb128_mask ((__v2di) __A,
(__v16qi) _mm_setzero_si128 (),
__M);
}
```
So this is likely to require some improvements to the DAG backend as well as clang frontend
We already expand basic truncation intrinsics to the trunc+shuffle sequence:
But the predicate variants are not, as its tricky to correctly merge the predicate select of the lower elements and the zeroing of the upper elements:
So this is likely to require some improvements to the DAG backend as well as clang frontend