llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.18k stars 11.63k forks source link

Clang trunk does not optimize trivial case #53861

Open socketpair opened 2 years ago

socketpair commented 2 years ago

https://godbolt.org/z/65Tza7cdf


#include <stdint.h>

extern uint16_t qwe, asd, zxc, zzz;

int firewall(const uint8_t *restrict data) {
    const uint8_t ip_proto = *data;
    const uint16_t dst_port = *((const uint16_t *)data + 32);

    if (ip_proto == 17 && dst_port == qwe) return 1;

    if (ip_proto == 17 && dst_port == asd) return 1;

    if (ip_proto == 17 && dst_port == zxc) return 1;

    if (ip_proto == 17 && dst_port == zzz) return 1;

    return 0;
}
firewall:                               # @firewall
        mov     dl, byte ptr [rdi]
        movzx   ecx, word ptr [rdi + 64]
        mov     eax, 1
        cmp     dl, 17
        jne     .LBB0_2
        cmp     cx, word ptr [rip + qwe]
        jne     .LBB0_2
.LBB0_7:
        ret
.LBB0_2:
        cmp     dl, 17
        jne     .LBB0_4
        cmp     cx, word ptr [rip + asd]
        je      .LBB0_7
.LBB0_4:
        cmp     dl, 17
        jne     .LBB0_6
        cmp     cx, word ptr [rip + zxc]
        je      .LBB0_7
.LBB0_6:
        cmp     dl, 17
        sete    al
        cmp     cx, word ptr [rip + zzz]
        sete    cl
        and     cl, al
        movzx   eax, cl
        ret

Compiled with -O3 to x86_64. cmp dl, 17 is repeated for some reason. This is a bug as I think.

socketpair commented 2 years ago

Interesting, extern volatile uint16_t qwe, asd, zxc, zzz; fixes the problem. Or changing uint16_t to uint32_t. What's happening ?

socketpair commented 2 years ago

@EugeneZelenko who should I trigger regarding this task ?

socketpair commented 2 years ago

CLang 12 does not have this problem: On the same (with 1,2,3,4 instead of external variables) C source Clang 13 gives ASM output like in the first message.

#include <stdint.h>

int firewall(uint8_t *data) {
    uint8_t ip_proto = *data;
    uint16_t dst_port = *(uint16_t *)data;
    if (ip_proto == 17 && dst_port == 1) return 1;
    if (ip_proto == 17 && dst_port == 2) return 1;
    if (ip_proto == 17 && dst_port == 3) return 1;
    if (ip_proto == 17 && dst_port == 4) return 1;
    return 0;
}
firewall:                               # @firewall
        movzx   eax, word ptr [rdi]
        cmp     byte ptr [rdi], 17
        sete    cl
        dec     eax
        cmp     ax, 4
        setb    al
        and     al, cl
        movzx   eax, al
        ret
socketpair commented 2 years ago

Another thing. Even for Clang12 (-O4):

for code

#include <stdint.h>

int firewall(uint8_t *data) {
    uint8_t ip_proto = *data;
    uint16_t dst_port = *(uint16_t *)data;
    if (ip_proto == 17 && dst_port == 11) return 1;
    if (ip_proto == 17 && dst_port == 22) return 1;
    if (ip_proto == 17 && dst_port == 33) return 1;
    if (ip_proto == 17 && dst_port == 44) return 1;
    return 0;
}

It gives:

.LCPI0_0:
        .short  44                              # 0x2c
        .short  33                              # 0x21
        .short  11                              # 0xb
        .short  22                              # 0x16
        .zero   2
        .zero   2
        .zero   2
        .zero   2
firewall:                               # @firewall
        cmp     byte ptr [rdi], 17
        sete    al
        vpbroadcastw    xmm0, word ptr [rdi]
        vpcmpeqw        xmm0, xmm0, xmmword ptr [rip + .LCPI0_0]
        vpmovsxwd       xmm0, xmm0
        vmovmskps       ecx, xmm0
        test    cl, cl
        setne   cl
        and     cl, al
        movzx   eax, cl
        ret

But if I remove comparison with 17:

#include <stdint.h>

int firewall(uint8_t *data) {
    uint8_t ip_proto = *data;
    uint16_t dst_port = *(uint16_t *)data;
    if (dst_port == 11) return 1;
    if (dst_port == 22) return 1;
    if (dst_port == 33) return 1;
    if (dst_port == 44) return 1;
    return 0;
}

it gives beautiful ASM code:

firewall:                               # @firewall
        movzx   ecx, word ptr [rdi]
        cmp     rcx, 44
        ja      .LBB0_2
        mov     eax, 1
        movabs  rdx, 17600780175360
        bt      rdx, rcx
        jae     .LBB0_2
        ret
.LBB0_2:
        xor     eax, eax
        ret

So, WHAT'S happening ? gcc even ancient versions does not have such problems and generate good code.

nickdesaulniers commented 2 years ago

CLang 12 does not have this problem:

There's a bunch of conflicting test cases here, but the original reported issue doesn't look like a regression to me. https://godbolt.org/z/f6K78Ex8h Test other versions of clang, too.

cc @zmodem

socketpair commented 2 years ago

@nickdesaulniers What about this: https://godbolt.org/z/h6zbWvr6n ?

nickdesaulniers commented 2 years ago

Yeah, that's a more demonstrative-of-regression test case. Seeing the repeated comparisons of an unchanged register against a constant is disappointing. Looks like a critical edge we could have split perhaps; the IR looks nice, so it's probably isel.

Bisecting LLVM converged on: 5c2e50b5d2412f4e1fdb48bce04917e1b6a29cf9 cc @aqjune Reported here, too. Tentatively marking as a regression.

LebedevRI commented 2 years ago

Hm, let me take a look.

LebedevRI commented 2 years ago

After f6b60b3b79606612e9df6b3ab8d4367ca673fedc, we now end up with

firewall:                               # @firewall
        .cfi_startproc
# %bb.0:                                # %entry
        vpbroadcastw    64(%rdi), %xmm0
        cmpb    $17, (%rdi)
        vpcmpeqw        .LCPI0_0(%rip), %xmm0, %xmm0
        sete    %al
        vmovd   %eax, %xmm1
        xorl    %eax, %eax
        vpbroadcastb    %xmm1, %xmm1
        vpmovsxwd       %xmm0, %xmm0
        vpand   %xmm0, %xmm1, %xmm0
        vpslld  $31, %xmm0, %xmm0
        vmovmskps       %xmm0, %ecx
        testl   %ecx, %ecx
        setne   %al
        retq

... which is better but still a regression from before poison safety patches. The missing part here is https://github.com/llvm/llvm-project/issues/54553.

socketpair commented 2 years ago

Wow, @LebedevRI ! Please see here https://godbolt.org/z/r57nvhGd3 Note the difference in firewall() and firewall2().

Hope this helps.

LebedevRI commented 2 years ago

Yeah, it does seem like we are also missing some variant of CFG simplification along the lines of https://alive2.llvm.org/ce/z/-qy6hh

socketpair commented 2 years ago

@LebedevRI Do I understand right, #54553 will fix my case and make assembler output like gcc ? (using a big 64-bit mask) ? If yes, please close this bug, and I will track #54553.

socketpair commented 1 year ago

@LebedevRI is there any progress on that ? I mean https://github.com/llvm/llvm-project/issues/53861#issuecomment-1079016846

socketpair commented 1 year ago

https://godbolt.org/z/GYqT96eq4 still triggered on Clang 16.

socketpair commented 1 year ago

@LebedevRI ? Possibly I should ping someone else ?

LebedevRI commented 1 year ago

I'm sorry, this wasn't on my interest list. I'm not sure what specific folds we are missing, but at least one is: https://alive2.llvm.org/ce/z/VM4PLR I think #54553 is the main issue here. cc @bcl5980 @spatel-gh

bcl5980 commented 1 year ago

This pattern looks a little complicate. Logical and/or use distributive law need to be careful with the operand position. https://alive2.llvm.org/ce/z/nM6kZb And we may need to consider normal binop also. So, there are more patterns we should detect: https://alive2.llvm.org/ce/z/khFZ2s I will try to add these patterns.

Candidate patch: https://reviews.llvm.org/D139408

socketpair commented 1 year ago

@bcl5980 Huge thanks! Starting from what versions, it will appear ?

socketpair commented 1 year ago

I will be very happy if you do backports.

socketpair commented 1 year ago

https://godbolt.org/z/sEj9Ys1d3 <- final (I believe) battle. Waiting while godbolt compiles Clang.

bcl5980 commented 1 year ago

https://godbolt.org/z/sEj9Ys1d3 <- final (I believe) battle. Waiting while godbolt compiles Clang.

It looks my change only can fix: https://godbolt.org/z/65Tza7cdf https://godbolt.org/z/sEj9Ys1d3 still need to be fixed.

bcl5980 commented 1 year ago

And actually before my change, @LebedevRI already check in a solution at https://github.com/llvm/llvm-project/commit/9ddff66d0c9c3e18d56e6b20aa26a2a8cdfb6d2b to fix the https://godbolt.org/z/65Tza7cdf So I do nothing for this ticket for now. Sorry for that.

LebedevRI commented 1 year ago

I'm not sure which patch helped here. Looks like we still need to fix SimplifyCFG to produce the select though.

socketpair commented 1 year ago

ICX gives the best (what I expected at the beginning o the topic) result for our case : https://godbolt.org/z/aExdzzYM9

socketpair commented 1 year ago

@LebedevRI @bcl5980

Sorry for bothering, but I confused. Especially about status of merging patches. So, what is merged and what is not ? What should I do in order everything to be moved further ?

LebedevRI commented 1 year ago

What should I do in order everything to be moved further ?

You -- nothing, unless you want to write the needed patches yourself. The two changes i wrote are either exposing a miscompilation somewhere else in LLVM (or are causing it), and i'm waiting on a reproducer from @alexfh. Until i get one, and deal with it, the patches are stuck.

socketpair commented 1 year ago

https://godbolt.org/z/c7zf5zWzv

image

-target bpf reveals more obviously what is happening.

socketpair commented 1 year ago

@alexfh ?

socketpair commented 1 year ago

@LebedevRI @bcl5980

https://godbolt.org/z/fh53d5fEn

Seems previous (already merged) patches are not sufficient. If condition become slightly more complex, it fails again to detect common condition.

socketpair commented 1 month ago

To wrap-up : https://godbolt.org/z/Te4459xd1

Different things happening, preventing from optimal program generation. @nickdesaulniers @LebedevRI @bcl5980 @alexfh

Please guide me what should I do next in order everything this to be fixed. Split to other issues ? Invite someone else ?