integers bitcasted to vectors then coerced to vectors of larger integers needs optimization

Validark commented 7 months ago

Zig Version

0.12.0-dev.2284+9b714e019

Steps to Reproduce and Observed Behavior

This code results in horrible assembly:

fn foo(x: u32) @Vector(8, u32) {
    return @as(@Vector(8, u4), @bitCast(x));
}

Expected Behavior

Should be compiled to the same thing as:

fn bar(x: u32) @Vector(8, u32) {
    const vec: @Vector(8, u32) = @splat(x);
    const vec2 = std.simd.iota(u5, 8) << @splat(2);
    return (vec >> vec2) & @as(@Vector(8, u32), @splat(0xF));
}

The latter of which compiles like so on the latest x86_64:

.LCPI1_0:
        .long   0
        .long   4
        .long   8
        .long   12
        .long   16
        .long   20
        .long   24
        .long   28
.LCPI1_1:
        .long   15
bar:
        vpbroadcastd    ymm0, edi
        vpsrlvd ymm0, ymm0, ymmword ptr [rip + .LCPI1_0]
        vpandd  ymm0, ymm0, dword ptr [rip + .LCPI1_1]{1to8}
        ret

Validark commented 7 months ago

@mlugg Is this likely to be a problem that should be solved in the Zig compiler itself or in LLVM?

mlugg commented 7 months ago

Ideally LLVM would just be improved to optimize this code correctly - however, if that doesn't happen, we may be able to make the compiler emit different LLVM IR which optimizes better.

ziglang / zig