llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.42k stars 12.16k forks source link

Wrong jump destination with -O2 -fexperimental-new-pass-manager #45203

Closed llvmbot closed 4 years ago

llvmbot commented 4 years ago
Bugzilla Link 45858
Resolution FIXED
Resolved on May 28, 2020 16:59
Version trunk
OS Windows NT
Blocks llvm/llvm-project#44654
Reporter LLVM Bugzilla Contributor
CC @aeubanks,@zygoloid,@rnk,@tstellar,@hjyamauchi
Fixed by commit(s) fc937806efd71eb3803b35d0920bb7d76bc3040b a634a80615b

Extended Description

Here's a reduced PoC on Compiler Explorer with clang (trunk): https://gcc.godbolt.org/z/ZKn7Uq. (This is a reduced version of Chromium's file //sandbox/win/src/filesystem_interception.cc.)

You can see the problem in the middle column. The right column (with -Os) demonstrates an expected behavior.

The problem happens at the if statement at the line 132888. (In Chromium, it's https://source.chromium.org/chromium/chromium/src/+/master:sandbox/win/src/crosscall_params.h;l=251;drc=c8cff7f9663ce6d1ef35e5c717f43c867c3906eb.)

if ((size > sizeof(*this)) ||
    (param_info_[index].offset_ > (sizeof(*this) - size))) {
  // It does not fit, abort copy.
  return false;
}

In both the repro and the expected behavior, the second condition is compiled into "cmp rdx, rcx; jae", where rdx = (sizeof(*this) - size) and rcx = paraminfo[index].offset_. So the destination of the JAE must be the code after the if statement.

In the repro case, however, the destination is .LBB2_9, which is the code inside the if block. I think the label .LBB2_9 must be placed at the "%bb.8".

This is not breaking Chromium because it's compiled with -Os, and -fexperimental-new-pass-manage is not set. I hit this when working on Firefox, which imports some of Chromium code, and Mozilla recently enabled -fexperimental-new-pass-manager.

tstellar commented 4 years ago

Merged: a634a80615b

efriedma-quic commented 4 years ago

Should be safe to backport.

tstellar commented 4 years ago

Eli, you reviewed this patch, do you think it is OK to backport?

https://reviews.llvm.org/rGfc937806efd71eb3803b35d0920bb7d76bc3040b

aeubanks commented 4 years ago

I've CC'd the release manager on the commit email in accordance to https://llvm.org/docs/HowToReleaseLLVM.html#merge-requests.

llvmbot commented 4 years ago

We cannot build the entire Firefox with clang11, but I confirmed the single file was compiled correctly with the LLVM head. I was really impressed by your work!

Will it be ported to 10.0.1?

aeubanks commented 4 years ago

Likely fixed by https://reviews.llvm.org/D80047 which was just pushed, could you try LLVM head to see if this is resolved?

aeubanks commented 4 years ago

Reduced test case (via llvm-reduce, with some reductions disabled since they crash llvm-reduce) which fails MIR verifications with $ bin/llc -o /dev/null -O2 -mtriple=x86_64-windows-msvc -verify-machineinstrs

define i1 @​foo(i32 %index, i8 %parameter_address, i32 %size, i1 zeroext %is_in_out, i32 %type) personality i8 bitcast (i32 (...) @​__C_specific_handler to i8) { entry: %tobool = icmp eq i32 %size, 0 %tobool5 = icmp ne i8* %parameter_address, null %or.cond = or i1 %tobool5, %tobool br i1 %or.cond, label %if.end7, label %return

if.end7: ; preds = %entry %cmp9 = icmp ugt i32 %size, 1024 br i1 %cmp9, label %return, label %lor.lhs.false

lor.lhs.false: ; preds = %if.end7 %conv10 = zext i32 undef to i64 %cmp12 = icmp ult i64 undef, %conv10 br i1 %cmp12, label %return, label %if.end14

if.end14: ; preds = %lor.lhs.false %call = invoke i8 @"?memcpy@@YAPEAXPEAXPEBX_K@Z"(i8 undef, i8* %parameter_address, i64 undef) to label %__try.cont unwind label %catch.dispatch

catch.dispatch: ; preds = %if.end14 %0 = catchswitch within none [label %__except] unwind to caller

__except: ; preds = %catch.dispatch %1 = catchpad within %0 [i8* null] catchret from %1 to label %return

__try.cont: ; preds = %if.end14 br i1 %is_in_out, label %if.then22, label %if.end23

if.then22: ; preds = %__try.cont br label %if.end23

if.end23: ; preds = %if.then22, %__try.cont br label %return

return: ; preds = %if.end23, %except, %lor.lhs.false, %if.end7, %entry %retval.1 = phi i1 [ false, %entry ], [ false, %lor.lhs.false ], [ false, %if.end7 ], [ true, %if.end23 ], [ false, %except ] ret i1 %retval.1 }

declare dso_local i8 @"?memcpy@@YAPEAXPEAXPEBX_K@Z"(i8, i8*, i64) local_unnamed_addr

declare dso_local i32 @​__C_specific_handler(...)

Bad machine code: Using an undefined physical register

Some discussion in https://reviews.llvm.org/D80047 but that doesn't seem to be the right approach according to efriedma.

aeubanks commented 4 years ago

I'm now suspecting something related to Windows exception handling. I'd be surprised if there were inlining issues with the new pass manager that repro'd on Linux since some major users of clang on Linux have the new pass manager enabled by default.

Removing the try/except in CopyParamIn seems to make the Branch Probability Basic Block pass not run and the generated code looks good. Now trying to figure out why that pass doesn't run when the try/except are present in the source code.

aeubanks commented 4 years ago

After some more digging, it looks like it's not an inliner problem, it's the Branch Probability Basic Block MIR pass.

Manually looked through the output of $ ./bin/clang++.exe -S ~/Downloads/a.cpp --target=x86_64-windows-msvc -fexperimental-new-pass-manager -O2 -o - -mllvm --print-after-all -masm=intel on a mostly unmodified clang, just a patch to only print functions with "CopyParamIn" in the function name.

Before Branch Probability Basic Block placement:

bb.7.lor.lhs.false: ; predecessors: %bb.5 successors: %bb.8(0x40000000), %bb.10(0x40000000); %bb.8(50.00%), %bb.10(50.00%) liveins: $rbx, $rdi, $rsi, $r8 renamable $ecx = MOV32rr renamable $ebx, implicit-def $rcx renamable $eax = MOV32rr renamable $esi, implicit-def $rax renamable $r14 = LEA64r killed renamable $rcx, 2, renamable $rcx, 0, $noreg renamable $ecx = MOV32rm renamable $rdi, 4, renamable $r14, 108, $noreg, implicit-def $rcx :: (load 4 from %ir.offset_, !tbaa !​16) $edx = MOV32ri 1024, implicit-def $rdx renamable $rdx = nsw SUB64rr killed renamable $rdx(tied-def 0), renamable $rax, implicit-def dead $eflags CMP64rr killed renamable $rdx, renamable $rcx, implicit-def $eflags JCC_1 %bb.10, 3, implicit killed $eflags

bb.8: ; predecessors: %bb.7 successors: %bb.9(0x80000000); %bb.9(100.00%)

bb.9.__except (landing-pad): ; predecessors: %bb.10, %bb.8 successors: %bb.14(0x80000000); %bb.14(100.00%)

renamable $eax = XOR32rr undef $eax(tied-def 0), undef $eax, implicit-def dead $eflags JMP_1 %bb.14

bb.10.if.end14: ; predecessors: %bb.7 successors: %bb.11(0x7ffff800), %bb.9(0x00000800); %bb.11(100.00%), %bb.9(0.00%) liveins: $rax, $rbx, $rcx, $rdi, $rsi, $r8, $r14 renamable $rcx = ADD64rr killed renamable $rcx(tied-def 0), renamable $rdi, implicit-def dead $eflags EH_LABEL <mcsymbol .Ltmp14> $rdx = MOV64rr killed $r8 $r8 = MOV64rr killed $rax CALL64pcrel32 @"?memcpy_wrapper@@YAPEAXPEAXPEBX_K@Z", <regmask $bh $bl $bp $bph $bpl $bx $di $dih $dil $ebp $ebx $edi $esi $hbp $hbx $hdi $hsi $rbp $rbx $rdi $rsi $si $sih $sil $r12 $r13 $r14 $ r15 $xmm6 $xmm7 $xmm8 $xmm9 $xmm10 and 25 more...>, implicit $rsp, implicit $ssp, implicit $rcx, implicit $rdx, implicit $r8, implicit-def $rsp, implicit-def $ssp, implicit-def dead $rax EH_LABEL <mcsymbol .Ltmp15>


After Branch Probability Basic Block placement:

bb.7.lor.lhs.false: ; predecessors: %bb.5 successors: %bb.8(0x40000000), %bb.9(0x40000000); %bb.8(50.00%), %bb.9(50.00%) liveins: $rbx, $rdi, $rsi, $r8 renamable $ecx = MOV32rr renamable $ebx, implicit-def $rcx renamable $eax = MOV32rr renamable $esi, implicit-def $rax renamable $r14 = LEA64r killed renamable $rcx, 2, renamable $rcx, 0, $noreg renamable $ecx = MOV32rm renamable $rdi, 4, renamable $r14, 108, $noreg, implicit-def $rcx :: (load 4 from %ir.offset_, !tbaa !​16) $edx = MOV32ri 1024, implicit-def $rdx renamable $rdx = nsw SUB64rr killed renamable $rdx(tied-def 0), renamable $rax, implicit-def dead $eflags CMP64rr killed renamable $rdx, renamable $rcx, implicit-def $eflags JCC_1 %bb.9, 3, implicit killed $eflags

bb.8: ; predecessors: %bb.7 successors: %bb.10(0x7ffff800), %bb.13(0x00000800); %bb.10(100.00%), %bb.13(0.00%)

renamable $rcx = ADD64rr killed renamable $rcx(tied-def 0), renamable $rdi, implicit-def dead $eflags EH_LABEL <mcsymbol .Ltmp14> $rdx = MOV64rr killed $r8 $r8 = MOV64rr killed $rax CALL64pcrel32 @"?memcpy_wrapper@@YAPEAXPEAXPEBX_K@Z", <regmask $bh $bl $bp $bph $bpl $bx $di $dih $dil $ebp $ebx $edi $esi $hbp $hbx $hdi $hsi $rbp $rbx $rdi $rsi $si $sih $sil $r12 $r13 $r14 $r15 $xmm6 $xmm7 $xmm8 $xmm9 $xmm10 and 25 more...>, implicit $rsp, implicit $ssp, implicit $rcx, implicit $rdx, implicit $r8, implicit-def $rsp, implicit-def $ssp, implicit-def dead $rax EH_LABEL <mcsymbol .Ltmp15>

; some irrelevant code

bb.9.if.end14: ; predecessors: %bb.7 liveins: $rax, $rbx, $rcx, $rdi, $rsi, $r8, $r14

bb.13.__except (landing-pad): ; predecessors: %bb.3, %bb.5, %bb.8 successors: %bb.14(0x80000000); %bb.14(100.00%)

renamable $eax = XOR32rr undef $eax(tied-def 0), undef $eax, implicit-def dead $eflags

bb.14.return: ; predecessors: %bb.1, %bb.13, %bb.12, %bb.0 liveins: $eax $al = KILL renamable $al, implicit killed $eax SEH_Epilogue $rsp = frame-destroy ADD64ri8 $rsp(tied-def 0), 32, implicit-def dead $eflags $rbx = frame-destroy POP64r implicit-def $rsp, implicit $rsp $rdi = frame-destroy POP64r implicit-def $rsp, implicit $rsp $rsi = frame-destroy POP64r implicit-def $rsp, implicit $rsp $r14 = frame-destroy POP64r implicit-def $rsp, implicit $rsp $rbp = frame-destroy POP64r implicit-def $rsp, implicit $rsp RETQ $al

aeubanks commented 4 years ago

Yes you're right, sorry I got confused.

llvmbot commented 4 years ago

In both cases, the comparison is compiled into "cmp rdx, rcx", where rdx = sizeof(*this) - size rcx = paraminfo[index].offset_ meaning the opposite of the C++ code. So if rdx is greater than or equal to rcx, the condition in C++ is false and we need to skip the if block.

aeubanks commented 4 years ago

In both the repro and the expected behavior, the second condition is compiled into "cmp rdx, rcx; jae", where rdx = (sizeof(*this) - size) and rcx = paraminfo[index].offset_. So the destination of the JAE must be the code after the if statement.

Isn't this actually correct? cmp rdx, rcs; jae is equivalent to cmp rcs, rdx; jb which means if (paraminfo[index].offset_ < (sizeof(*this) - size)), jump to the basic block where eax, the return value, is zeroed out. Which is what the original code does?

aeubanks commented 4 years ago

I tried disabling the inliner in the old and new pass managers. They both produce the same LLVM IR as well as the same assembly for CopyParamsIn under -O2.

With an unmodified upstream clang, I compiled the repro into LLVM IR with the old and new pass manager (and made sure that they were indeed different). Then with each LLVM IR version I compiled that to assembly under both the old and new pass manager, and both pass managers compile the same LLVM IR the same way to assembly (again under -O2).

Since the inliner implementation is different between the two pass managers, I'm suspecting the inliner implementation in the new pass manager.

llvmbot commented 4 years ago

Thank you for looking at this!

I'm working for Mozilla and found this issue while integrating a part of the latest Chromium code into our Firefox codebase. And yes, because of this jump, Firefox fails to start all child processes!

Theoretically, Chromium will hit the same problem if it's compiled with '-O2 -fexperimental-new-pass-manager', failing to start child processes.

aeubanks commented 4 years ago

Taking a look.

I've been interested in burning down the remaining new pass manager failures, so a miscompile under the new pass manager is worrying.

How did you find this, bad things were happening in the final binary and you figured out that this specific jump was the issue?

rnk commented 4 years ago

Arthur, PTAL.

llvmbot commented 4 years ago

I reduced the PoC further (Now it's just 430 lines!). https://godbolt.org/z/Shq75H

As shown below, .LBB2_9 and .LBB2_13 point to the same address, but .LBB2_9 should be at %bb.8.

.LBB2_5:

DEBUG_VALUE: CopyParamIn:parameter_address <- $r8

    #DEBUG_VALUE: CopyParamIn:this <- $rdi
    #DEBUG_VALUE: CopyParamIn:type <- [DW_OP_plus_uconst 88] [$rbp+0]
    #DEBUG_VALUE: CopyParamIn:size <- $esi
    #DEBUG_VALUE: CopyParamIn:index <- $ebx
    #DEBUG_VALUE: CopyParamIn:is_in_out <- undef
    cmp     esi, 1024
    ja      .LBB2_13

%bb.7:

    #DEBUG_VALUE: CopyParamIn:parameter_address <- $r8
    #DEBUG_VALUE: CopyParamIn:this <- $rdi
    #DEBUG_VALUE: CopyParamIn:type <- [DW_OP_plus_uconst 88] [$rbp+0]
    #DEBUG_VALUE: CopyParamIn:size <- $esi
    #DEBUG_VALUE: CopyParamIn:index <- $ebx
    #DEBUG_VALUE: CopyParamIn:is_in_out <- undef
    mov     ecx, ebx
    mov     eax, esi
    lea     r14, [rcx + 2*rcx]
    mov     ecx, dword ptr [rdi + 4*r14 + 108]
    mov     edx, 1024
    sub     rdx, rax
    cmp     rdx, rcx
    jae     .LBB2_9 <<<< This is wrong.

%bb.8:

    #DEBUG_VALUE: CopyParamIn:parameter_address <- $r8
    #DEBUG_VALUE: CopyParamIn:this <- $rdi
    #DEBUG_VALUE: CopyParamIn:type <- [DW_OP_plus_uconst 88] [$rbp+0]
    #DEBUG_VALUE: CopyParamIn:size <- $esi
    #DEBUG_VALUE: CopyParamIn:index <- $ebx
    #DEBUG_VALUE: CopyParamIn:is_in_out <- undef
    add     rcx, rdi

...snip...

.LBB2_9:

DEBUG_VALUE: CopyParamIn:parameter_address <- $r8

    #DEBUG_VALUE: CopyParamIn:this <- $rdi
    #DEBUG_VALUE: CopyParamIn:type <- [DW_OP_plus_uconst 88] [$rbp+0]
    #DEBUG_VALUE: CopyParamIn:size <- $esi
    #DEBUG_VALUE: CopyParamIn:index <- $ebx

.LBB2_13:

DEBUG_VALUE: CopyParamIn:this <- $rdi

    #DEBUG_VALUE: CopyParamIn:type <- [DW_OP_plus_uconst 88] [$rbp+0]
    #DEBUG_VALUE: CopyParamIn:size <- $esi
    #DEBUG_VALUE: CopyParamIn:index <- $ebx
    xor     eax, eax
llvmbot commented 4 years ago

assigned to @efriedma-quic