The por of xmm1 with 112 (0b0111_0000) is a no-op and should be optimized out, as pshufb ignores bits 5-7 of the mask argument:
.LCPI0_0:
.zero 16,112
shuffle_or:
por xmm1, xmmword ptr [rip + .LCPI0_0]
pshufb xmm0, xmm1
ret
Writing _mm_shuffle_epi8(bytes, _mm_set1_epi8(127)) in the source emits a pshufb with 15 in the assembly, so it seems like LLVM is aware of this optimization on some level, but fails to apply it here.
Clang example: https://godbolt.org/z/ec4P4j78b, flags:
-O3 -march=x86-64-v2
. Not clang specific, same behaviour on rust nightly.The
por
of xmm1 with 112 (0b0111_0000
) is a no-op and should be optimized out, as pshufb ignores bits 5-7 of the mask argument:Writing
_mm_shuffle_epi8(bytes, _mm_set1_epi8(127))
in the source emits a pshufb with15
in the assembly, so it seems like LLVM is aware of this optimization on some level, but fails to apply it here.