llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
27.1k stars 11.1k forks source link

`1 << cttz(z)` should be folded into `z & -z` even on machines with cttz built-in #91305

Open Validark opened 2 months ago

Validark commented 2 months ago

https://alive2.llvm.org/ce/z/on8IIK suggests 1 << cttz(z) = z & -z is already folded by instcombine

Originally posted by @RKSimon in https://github.com/llvm/llvm-project/issues/90000#issuecomment-2081442262

This code:

export fn bar(y: u64) u64 {
    if (y == 0) return 0;
    return @as(u64, 1) << @intCast(@ctz(y));
}

Gives me this emit for the risc-v sifive u74:

bar:
        neg     a1, a0
        and     a0, a0, a1
        ret

Exactly what we want.

Now, let's "upgrade" to the sifive x280:

bar:
        ctz     a1, a0
        li      a2, 1
        sll     a1, a2, a1
        bnez    a0, .LBB0_2
        li      a1, 0
.LBB0_2:
        mv      a0, a1
        ret

Oops! Same problem occurs on x86 Zen 3:

bar:
        tzcnt   rax, rdi
        mov     ecx, 1
        shlx    rax, rcx, rax
        cmovb   rax, rdi
        ret

And on aarch64 apple_latest:

bar:
        rbit    x8, x0
        clz     x8, x8
        mov     w9, #1
        lsl     x8, x9, x8
        cmp     x0, #0
        csel    x0, xzr, x8, eq
        ret

Godbolt link

Related: https://github.com/llvm/llvm-project/issues/84763 https://github.com/llvm/llvm-project/issues/90000

dtcxzyw commented 2 months ago

Can you try https://github.com/llvm/llvm-project/pull/85066?

Validark commented 2 months ago

@dtcxzyw Looks like the transformation does not happen due to the AND with 63.

https://alive2.llvm.org/ce/z/_tyFCK

dtcxzyw commented 2 months ago

@dtcxzyw Looks like the transformation does not happen due to the AND with 63.

https://alive2.llvm.org/ce/z/_tyFCK

This is a codegen patch. And it hasn't been merged into main. You should apply this patch and run llc on your local machine.

does not happen due to the AND with 63.

is_zero_poison should be marked as true (based on the dominating condition), then the and inst will be eliminated.

BTW, opt -O3 already folds this pattern as you expected. Doesn't zig compiler use the default optimization pipeline? godbolt: https://zig.godbolt.org/z/7qoEjox3K