Closed llvmbot closed 4 years ago
Merged: a634a80615b
Should be safe to backport.
Eli, you reviewed this patch, do you think it is OK to backport?
https://reviews.llvm.org/rGfc937806efd71eb3803b35d0920bb7d76bc3040b
I've CC'd the release manager on the commit email in accordance to https://llvm.org/docs/HowToReleaseLLVM.html#merge-requests.
We cannot build the entire Firefox with clang11, but I confirmed the single file was compiled correctly with the LLVM head. I was really impressed by your work!
Will it be ported to 10.0.1?
Likely fixed by https://reviews.llvm.org/D80047 which was just pushed, could you try LLVM head to see if this is resolved?
Reduced test case (via llvm-reduce, with some reductions disabled since they crash llvm-reduce) which fails MIR verifications with $ bin/llc -o /dev/null -O2 -mtriple=x86_64-windows-msvc -verify-machineinstrs
define i1 @foo(i32 %index, i8 %parameter_address, i32 %size, i1 zeroext %is_in_out, i32 %type) personality i8 bitcast (i32 (...) @__C_specific_handler to i8) { entry: %tobool = icmp eq i32 %size, 0 %tobool5 = icmp ne i8* %parameter_address, null %or.cond = or i1 %tobool5, %tobool br i1 %or.cond, label %if.end7, label %return
if.end7: ; preds = %entry %cmp9 = icmp ugt i32 %size, 1024 br i1 %cmp9, label %return, label %lor.lhs.false
lor.lhs.false: ; preds = %if.end7 %conv10 = zext i32 undef to i64 %cmp12 = icmp ult i64 undef, %conv10 br i1 %cmp12, label %return, label %if.end14
if.end14: ; preds = %lor.lhs.false %call = invoke i8 @"?memcpy@@YAPEAXPEAXPEBX_K@Z"(i8 undef, i8* %parameter_address, i64 undef) to label %__try.cont unwind label %catch.dispatch
catch.dispatch: ; preds = %if.end14 %0 = catchswitch within none [label %__except] unwind to caller
__except: ; preds = %catch.dispatch %1 = catchpad within %0 [i8* null] catchret from %1 to label %return
__try.cont: ; preds = %if.end14 br i1 %is_in_out, label %if.then22, label %if.end23
if.then22: ; preds = %__try.cont br label %if.end23
if.end23: ; preds = %if.then22, %__try.cont br label %return
return: ; preds = %if.end23, %except, %lor.lhs.false, %if.end7, %entry %retval.1 = phi i1 [ false, %entry ], [ false, %lor.lhs.false ], [ false, %if.end7 ], [ true, %if.end23 ], [ false, %except ] ret i1 %retval.1 }
declare dso_local i8 @"?memcpy@@YAPEAXPEAXPEBX_K@Z"(i8, i8*, i64) local_unnamed_addr
declare dso_local i32 @__C_specific_handler(...)
Bad machine code: Using an undefined physical register
Some discussion in https://reviews.llvm.org/D80047 but that doesn't seem to be the right approach according to efriedma.
I'm now suspecting something related to Windows exception handling. I'd be surprised if there were inlining issues with the new pass manager that repro'd on Linux since some major users of clang on Linux have the new pass manager enabled by default.
Removing the try/except in CopyParamIn seems to make the Branch Probability Basic Block pass not run and the generated code looks good. Now trying to figure out why that pass doesn't run when the try/except are present in the source code.
After some more digging, it looks like it's not an inliner problem, it's the Branch Probability Basic Block MIR pass.
Manually looked through the output of $ ./bin/clang++.exe -S ~/Downloads/a.cpp --target=x86_64-windows-msvc -fexperimental-new-pass-manager -O2 -o - -mllvm --print-after-all -masm=intel on a mostly unmodified clang, just a patch to only print functions with "CopyParamIn" in the function name.
Before Branch Probability Basic Block placement:
bb.7.lor.lhs.false: ; predecessors: %bb.5 successors: %bb.8(0x40000000), %bb.10(0x40000000); %bb.8(50.00%), %bb.10(50.00%) liveins: $rbx, $rdi, $rsi, $r8 renamable $ecx = MOV32rr renamable $ebx, implicit-def $rcx renamable $eax = MOV32rr renamable $esi, implicit-def $rax renamable $r14 = LEA64r killed renamable $rcx, 2, renamable $rcx, 0, $noreg renamable $ecx = MOV32rm renamable $rdi, 4, renamable $r14, 108, $noreg, implicit-def $rcx :: (load 4 from %ir.offset_, !tbaa !16) $edx = MOV32ri 1024, implicit-def $rdx renamable $rdx = nsw SUB64rr killed renamable $rdx(tied-def 0), renamable $rax, implicit-def dead $eflags CMP64rr killed renamable $rdx, renamable $rcx, implicit-def $eflags JCC_1 %bb.10, 3, implicit killed $eflags
bb.8: ; predecessors: %bb.7 successors: %bb.9(0x80000000); %bb.9(100.00%)
bb.9.__except (landing-pad): ; predecessors: %bb.10, %bb.8 successors: %bb.14(0x80000000); %bb.14(100.00%)
renamable $eax = XOR32rr undef $eax(tied-def 0), undef $eax, implicit-def dead $eflags JMP_1 %bb.14
bb.10.if.end14: ; predecessors: %bb.7 successors: %bb.11(0x7ffff800), %bb.9(0x00000800); %bb.11(100.00%), %bb.9(0.00%) liveins: $rax, $rbx, $rcx, $rdi, $rsi, $r8, $r14 renamable $rcx = ADD64rr killed renamable $rcx(tied-def 0), renamable $rdi, implicit-def dead $eflags EH_LABEL <mcsymbol .Ltmp14> $rdx = MOV64rr killed $r8 $r8 = MOV64rr killed $rax CALL64pcrel32 @"?memcpy_wrapper@@YAPEAXPEAXPEBX_K@Z", <regmask $bh $bl $bp $bph $bpl $bx $di $dih $dil $ebp $ebx $edi $esi $hbp $hbx $hdi $hsi $rbp $rbx $rdi $rsi $si $sih $sil $r12 $r13 $r14 $ r15 $xmm6 $xmm7 $xmm8 $xmm9 $xmm10 and 25 more...>, implicit $rsp, implicit $ssp, implicit $rcx, implicit $rdx, implicit $r8, implicit-def $rsp, implicit-def $ssp, implicit-def dead $rax EH_LABEL <mcsymbol .Ltmp15>
After Branch Probability Basic Block placement:
bb.7.lor.lhs.false: ; predecessors: %bb.5 successors: %bb.8(0x40000000), %bb.9(0x40000000); %bb.8(50.00%), %bb.9(50.00%) liveins: $rbx, $rdi, $rsi, $r8 renamable $ecx = MOV32rr renamable $ebx, implicit-def $rcx renamable $eax = MOV32rr renamable $esi, implicit-def $rax renamable $r14 = LEA64r killed renamable $rcx, 2, renamable $rcx, 0, $noreg renamable $ecx = MOV32rm renamable $rdi, 4, renamable $r14, 108, $noreg, implicit-def $rcx :: (load 4 from %ir.offset_, !tbaa !16) $edx = MOV32ri 1024, implicit-def $rdx renamable $rdx = nsw SUB64rr killed renamable $rdx(tied-def 0), renamable $rax, implicit-def dead $eflags CMP64rr killed renamable $rdx, renamable $rcx, implicit-def $eflags JCC_1 %bb.9, 3, implicit killed $eflags
bb.8: ; predecessors: %bb.7 successors: %bb.10(0x7ffff800), %bb.13(0x00000800); %bb.10(100.00%), %bb.13(0.00%)
renamable $rcx = ADD64rr killed renamable $rcx(tied-def 0), renamable $rdi, implicit-def dead $eflags EH_LABEL <mcsymbol .Ltmp14> $rdx = MOV64rr killed $r8 $r8 = MOV64rr killed $rax CALL64pcrel32 @"?memcpy_wrapper@@YAPEAXPEAXPEBX_K@Z", <regmask $bh $bl $bp $bph $bpl $bx $di $dih $dil $ebp $ebx $edi $esi $hbp $hbx $hdi $hsi $rbp $rbx $rdi $rsi $si $sih $sil $r12 $r13 $r14 $r15 $xmm6 $xmm7 $xmm8 $xmm9 $xmm10 and 25 more...>, implicit $rsp, implicit $ssp, implicit $rcx, implicit $rdx, implicit $r8, implicit-def $rsp, implicit-def $ssp, implicit-def dead $rax EH_LABEL <mcsymbol .Ltmp15>
; some irrelevant code
bb.9.if.end14: ; predecessors: %bb.7 liveins: $rax, $rbx, $rcx, $rdi, $rsi, $r8, $r14
bb.13.__except (landing-pad): ; predecessors: %bb.3, %bb.5, %bb.8 successors: %bb.14(0x80000000); %bb.14(100.00%)
renamable $eax = XOR32rr undef $eax(tied-def 0), undef $eax, implicit-def dead $eflags
bb.14.return: ; predecessors: %bb.1, %bb.13, %bb.12, %bb.0 liveins: $eax $al = KILL renamable $al, implicit killed $eax SEH_Epilogue $rsp = frame-destroy ADD64ri8 $rsp(tied-def 0), 32, implicit-def dead $eflags $rbx = frame-destroy POP64r implicit-def $rsp, implicit $rsp $rdi = frame-destroy POP64r implicit-def $rsp, implicit $rsp $rsi = frame-destroy POP64r implicit-def $rsp, implicit $rsp $r14 = frame-destroy POP64r implicit-def $rsp, implicit $rsp $rbp = frame-destroy POP64r implicit-def $rsp, implicit $rsp RETQ $al
Yes you're right, sorry I got confused.
In both cases, the comparison is compiled into "cmp rdx, rcx", where rdx = sizeof(*this) - size rcx = paraminfo[index].offset_ meaning the opposite of the C++ code. So if rdx is greater than or equal to rcx, the condition in C++ is false and we need to skip the if block.
In both the repro and the expected behavior, the second condition is compiled into "cmp rdx, rcx; jae", where rdx = (sizeof(*this) - size) and rcx = paraminfo[index].offset_. So the destination of the JAE must be the code after the if statement.
Isn't this actually correct? cmp rdx, rcs; jae is equivalent to cmp rcs, rdx; jb which means if (paraminfo[index].offset_ < (sizeof(*this) - size)), jump to the basic block where eax, the return value, is zeroed out. Which is what the original code does?
I tried disabling the inliner in the old and new pass managers. They both produce the same LLVM IR as well as the same assembly for CopyParamsIn under -O2.
With an unmodified upstream clang, I compiled the repro into LLVM IR with the old and new pass manager (and made sure that they were indeed different). Then with each LLVM IR version I compiled that to assembly under both the old and new pass manager, and both pass managers compile the same LLVM IR the same way to assembly (again under -O2).
Since the inliner implementation is different between the two pass managers, I'm suspecting the inliner implementation in the new pass manager.
Thank you for looking at this!
I'm working for Mozilla and found this issue while integrating a part of the latest Chromium code into our Firefox codebase. And yes, because of this jump, Firefox fails to start all child processes!
Theoretically, Chromium will hit the same problem if it's compiled with '-O2 -fexperimental-new-pass-manager', failing to start child processes.
Taking a look.
I've been interested in burning down the remaining new pass manager failures, so a miscompile under the new pass manager is worrying.
How did you find this, bad things were happening in the final binary and you figured out that this specific jump was the issue?
Arthur, PTAL.
I reduced the PoC further (Now it's just 430 lines!). https://godbolt.org/z/Shq75H
As shown below, .LBB2_9 and .LBB2_13 point to the same address, but .LBB2_9 should be at %bb.8.
.LBB2_5:
#DEBUG_VALUE: CopyParamIn:this <- $rdi
#DEBUG_VALUE: CopyParamIn:type <- [DW_OP_plus_uconst 88] [$rbp+0]
#DEBUG_VALUE: CopyParamIn:size <- $esi
#DEBUG_VALUE: CopyParamIn:index <- $ebx
#DEBUG_VALUE: CopyParamIn:is_in_out <- undef
cmp esi, 1024
ja .LBB2_13
#DEBUG_VALUE: CopyParamIn:parameter_address <- $r8
#DEBUG_VALUE: CopyParamIn:this <- $rdi
#DEBUG_VALUE: CopyParamIn:type <- [DW_OP_plus_uconst 88] [$rbp+0]
#DEBUG_VALUE: CopyParamIn:size <- $esi
#DEBUG_VALUE: CopyParamIn:index <- $ebx
#DEBUG_VALUE: CopyParamIn:is_in_out <- undef
mov ecx, ebx
mov eax, esi
lea r14, [rcx + 2*rcx]
mov ecx, dword ptr [rdi + 4*r14 + 108]
mov edx, 1024
sub rdx, rax
cmp rdx, rcx
jae .LBB2_9 <<<< This is wrong.
#DEBUG_VALUE: CopyParamIn:parameter_address <- $r8
#DEBUG_VALUE: CopyParamIn:this <- $rdi
#DEBUG_VALUE: CopyParamIn:type <- [DW_OP_plus_uconst 88] [$rbp+0]
#DEBUG_VALUE: CopyParamIn:size <- $esi
#DEBUG_VALUE: CopyParamIn:index <- $ebx
#DEBUG_VALUE: CopyParamIn:is_in_out <- undef
add rcx, rdi
...snip...
.LBB2_9:
#DEBUG_VALUE: CopyParamIn:this <- $rdi
#DEBUG_VALUE: CopyParamIn:type <- [DW_OP_plus_uconst 88] [$rbp+0]
#DEBUG_VALUE: CopyParamIn:size <- $esi
#DEBUG_VALUE: CopyParamIn:index <- $ebx
.LBB2_13:
#DEBUG_VALUE: CopyParamIn:type <- [DW_OP_plus_uconst 88] [$rbp+0]
#DEBUG_VALUE: CopyParamIn:size <- $esi
#DEBUG_VALUE: CopyParamIn:index <- $ebx
xor eax, eax
assigned to @efriedma-quic
Extended Description
Here's a reduced PoC on Compiler Explorer with clang (trunk): https://gcc.godbolt.org/z/ZKn7Uq. (This is a reduced version of Chromium's file //sandbox/win/src/filesystem_interception.cc.)
You can see the problem in the middle column. The right column (with -Os) demonstrates an expected behavior.
The problem happens at the if statement at the line 132888. (In Chromium, it's https://source.chromium.org/chromium/chromium/src/+/master:sandbox/win/src/crosscall_params.h;l=251;drc=c8cff7f9663ce6d1ef35e5c717f43c867c3906eb.)
In both the repro and the expected behavior, the second condition is compiled into "cmp rdx, rcx; jae", where rdx = (sizeof(*this) - size) and rcx = paraminfo[index].offset_. So the destination of the JAE must be the code after the if statement.
In the repro case, however, the destination is .LBB2_9, which is the code inside the if block. I think the label .LBB2_9 must be placed at the "%bb.8".
This is not breaking Chromium because it's compiled with -Os, and -fexperimental-new-pass-manage is not set. I hit this when working on Firefox, which imports some of Chromium code, and Mozilla recently enabled -fexperimental-new-pass-manager.