Looks like LCPI0_1 was sort of put there but our code ultimately decided to use vbroadcasti128 instead so only the 16-byte version in LCPI0_2 was necessary.
Interestingly, when I flip the vpshufb(vec, x >> @splat(4)) + @select(u8, x == @as(@TypeOf(x), @splat(0)), vpshufb(vec, x), @as(@TypeOf(x), @splat(0))); to be @select(u8, x == @as(@TypeOf(x), @splat(0)), vpshufb(vec, x), @as(@TypeOf(x), @splat(0))) + vpshufb(vec, x >> @splat(4));, the problem disappears:
Here is a dump of the offending code via zig build-obj ./src/llvm_code.zig -O ReleaseFast -target x86_64-linux -mcpu znver3 --verbose-llvm-ir -fstrip >llvm_code.ll 2>&1
I had this code: (Godbolt link)
Produce this ASM:
Looks like
LCPI0_1
was sort of put there but our code ultimately decided to usevbroadcasti128
instead so only the 16-byte version inLCPI0_2
was necessary.Interestingly, when I flip the
vpshufb(vec, x >> @splat(4)) + @select(u8, x == @as(@TypeOf(x), @splat(0)), vpshufb(vec, x), @as(@TypeOf(x), @splat(0)));
to be@select(u8, x == @as(@TypeOf(x), @splat(0)), vpshufb(vec, x), @as(@TypeOf(x), @splat(0))) + vpshufb(vec, x >> @splat(4));
, the problem disappears:Here is a dump of the offending code via
zig build-obj ./src/llvm_code.zig -O ReleaseFast -target x86_64-linux -mcpu znver3 --verbose-llvm-ir -fstrip >llvm_code.ll 2>&1