llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.13k stars 12.01k forks source link

[AVX2] `vpsllvq` builtin-semantics are not recognized by LLVM vectors #109888

Open Validark opened 1 month ago

Validark commented 1 month ago

I wrote some Zig code trying to get the semantics of vpsllvq:

export fn foo(vec: @Vector(4, u64)) @Vector(4, u64) {
    return @select(
        u64,
        vec < @as(@Vector(4, u64), @splat(64)),
        @as(@Vector(4, u64), @splat(1)) << @truncate(vec),
        @as(@Vector(4, u64), @splat(0))
    );
}

Optimized LLVM:

define dso_local range(i64 0, -9223372036854775807) <4 x i64> @foo(<4 x i64> %0) local_unnamed_addr {
Entry:
  %1 = icmp ult <4 x i64> %0, <i64 64, i64 64, i64 64, i64 64>
  %2 = and <4 x i64> %0, <i64 63, i64 63, i64 63, i64 63>
  %3 = shl nuw <4 x i64> <i64 1, i64 1, i64 1, i64 1>, %2
  %4 = select <4 x i1> %1, <4 x i64> %3, <4 x i64> zeroinitializer
  ret <4 x i64> %4
}

Compiled for Zen 3:

.LCPI0_0:
        .quad   -9223372036854775808
.LCPI0_1:
        .quad   -9223372036854775745
.LCPI0_2:
        .quad   63
.LCPI0_3:
        .quad   1
foo:
        vpbroadcastq    ymm1, qword ptr [rip + .LCPI0_0]
        vpbroadcastq    ymm4, qword ptr [rip + .LCPI0_2]
        vpbroadcastq    ymm2, qword ptr [rip + .LCPI0_1]
        vpbroadcastq    ymm3, qword ptr [rip + .LCPI0_3]
        vpxor   ymm1, ymm0, ymm1
        vpand   ymm0, ymm0, ymm4
        vpcmpgtq        ymm1, ymm1, ymm2
        vpsllvq ymm0, ymm3, ymm0
        vpandn  ymm0, ymm1, ymm0
        ret

I was expecting just a vpsllvq instruction.

llvmbot commented 1 month ago

@llvm/issue-subscribers-backend-x86

Author: Niles Salter (Validark)

I wrote some Zig code trying to get the semantics of `vpsllvq`: ```zig export fn foo(vec: @Vector(4, u64)) @Vector(4, u64) { return @select( u64, vec < @as(@Vector(4, u64), @splat(64)), @as(@Vector(4, u64), @splat(1)) << @truncate(vec), @as(@Vector(4, u64), @splat(0)) ); } ``` Optimized LLVM: ```llvm define dso_local range(i64 0, -9223372036854775807) <4 x i64> @foo(<4 x i64> %0) local_unnamed_addr { Entry: %1 = icmp ult <4 x i64> %0, <i64 64, i64 64, i64 64, i64 64> %2 = and <4 x i64> %0, <i64 63, i64 63, i64 63, i64 63> %3 = shl nuw <4 x i64> <i64 1, i64 1, i64 1, i64 1>, %2 %4 = select <4 x i1> %1, <4 x i64> %3, <4 x i64> zeroinitializer ret <4 x i64> %4 } ``` Compiled for Zen 3: ```asm .LCPI0_0: .quad -9223372036854775808 .LCPI0_1: .quad -9223372036854775745 .LCPI0_2: .quad 63 .LCPI0_3: .quad 1 foo: vpbroadcastq ymm1, qword ptr [rip + .LCPI0_0] vpbroadcastq ymm4, qword ptr [rip + .LCPI0_2] vpbroadcastq ymm2, qword ptr [rip + .LCPI0_1] vpbroadcastq ymm3, qword ptr [rip + .LCPI0_3] vpxor ymm1, ymm0, ymm1 vpand ymm0, ymm0, ymm4 vpcmpgtq ymm1, ymm1, ymm2 vpsllvq ymm0, ymm3, ymm0 vpandn ymm0, ymm1, ymm0 ret ``` I was expecting just a `vpsllvq` instruction.
RKSimon commented 3 weeks ago

This can be handled more generally:

define dso_local range(i64 0, -9223372036854775807) <4 x i64> @src(<4 x i64> %0) local_unnamed_addr {
Entry:
  %1 = icmp ult <4 x i64> %0, <i64 64, i64 64, i64 64, i64 64>
  %2 = and <4 x i64> %0, <i64 63, i64 63, i64 63, i64 63>
  %3 = shl nuw <4 x i64> <i64 1, i64 1, i64 1, i64 1>, %2
  %4 = select <4 x i1> %1, <4 x i64> %3, <4 x i64> zeroinitializer
  ret <4 x i64> %4
}
define dso_local range(i64 0, -9223372036854775807) <4 x i64> @tgt(<4 x i64> %0) local_unnamed_addr {
Entry:
  %1 = freeze <4 x i64> %0
  %2 = icmp ult <4 x i64> %1, <i64 64, i64 64, i64 64, i64 64>
  %3 = shl nuw <4 x i64> <i64 1, i64 1, i64 1, i64 1>, %1
  %4 = select <4 x i1> %2, <4 x i64> %3, <4 x i64> zeroinitializer
  ret <4 x i64> %4
}
RKSimon commented 2 weeks ago

Alive2: https://alive2.llvm.org/ce/z/MUmPV-