Different codegen with/without -g

llvmbot commented 5 years ago


Bugzilla Link	42138
Resolution	FIXED
Resolved on	Jan 13, 2021 13:34
Version	trunk
OS	Linux
Blocks	llvm/llvm-project#37076
Reporter	LLVM Bugzilla Contributor
CC	@gregbedwell,@MaskRay,@JDevlieghere,@jmorse,@walkerkd,@pogo59,@vedantk

Extended Description

Using the check-cfc tool ( https://github.com/llvm/llvm-project/tree/master/clang/utils/check_cfc ) to spot a codegen difference depending on whether -g is specified or not.

$ cat PowerParser.ii.cc

template <typename, typename = int> class e; class allocator { public: ~allocator(); }; template <typename, typename> class e { public: e(char , allocator = allocator()); }; template <typename b, typename c, typename d> bool operator==(e<c, d>, b); class f { public: f(int , int , int , int, int, int, int); e g(); void j(); }; int h, i; class k { void l(); bool m_fn4(); int m; int n; int q; int fmap; }; void k::l() { e o = ""; for (;;) { int p = 0; for (;;) { if (m_fn4()) break; f a(&q, &fmap, &m, n, h, i, 0); if (a.g() == "") a.j(); } } }

$ ./llvm-project/clang/utils/check_cfc/clang++ PowerParser.ii.cc -w -c -O1 -o tmp.ll

Check CFC, checking: dash_g_no_change PowerParser.ii.cc Code difference detected with -g --- /tmp/tmpcdg_LH.o

+++ /tmp/tmpfkaPkW.o

@@ -19,6 +19,6 @@

28: 4c 8d 73 08 lea 0x8(%rbx),%r14 2c: 4c 8d 7b 0c lea 0xc(%rbx),%r15 30: 4c 8d 64 24 08 lea 0x8(%rsp),%r12

35: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
3c: 00 00 00
3f: 90 nop
35: eb 09 jmp 40 <_ZN1k1lEv+0x40>
37: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1)
3e: 00 00 Diff truncated

bjope commented 2 years ago

mentioned in issue llvm/llvm-bugzilla-archive#43964

MaskRay commented 3 years ago

Closing the MC issue in favor of bug 48742.

MaskRay commented 3 years ago

It can be reproduced using the PowerParser.ii.cc by clang 7809fa20400000fd40b4a4b56696c7fbcd0f0fa9 (committed at 2021-01-06). So I decide to reopen it.

clang -w -O1 -c PowerParser.ii.cc -o dbg.o -g clang -w -O1 -c PowerParser.ii.cc -o rel.o

Then

objdump -d dbg.o > dbg_objdump objdump -d rel.o > rel_objdump diff dbg_objdump rel_objdump

2c2 < dbg.o: file format elf64-x86-64

rel.o: file format elf64-x86-64 26,29c26,29 < 40: eb 0e jmp 50 <_ZN1k1lEv+0x50> < 42: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) < 49: 00 00 00 < 4c: 0f 1f 40 00 nopl 0x0(%rax)

40: e9 0b 00 00 00 jmpq 50 <_ZN1k1lEv+0x50> 45: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) 4c: 00 00 00 4f: 90 nop

The newly reproduced issue is due to an assembler optimization in llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp

https://reviews.llvm.org/D75203#2491618

-mllvm -x86-pad-for-align=false is a workaround.

llvmbot commented 3 years ago

Both debug and release version produce the same LLVM IR when exclude debug info.

It seems like a assembler bug, since passing '-fno-integrated-as' to clang makes no difference on machine code.

llvmbot commented 3 years ago

It can be reproduced using the PowerParser.ii.cc by clang 7809fa20400000fd40b4a4b56696c7fbcd0f0fa9 (committed at 2021-01-06). So I decide to reopen it.

clang -w -O1 -c PowerParser.ii.cc -o dbg.o -g clang -w -O1 -c PowerParser.ii.cc -o rel.o

Then

objdump -d dbg.o > dbg_objdump objdump -d rel.o > rel_objdump diff dbg_objdump rel_objdump

2c2 < dbg.o: file format elf64-x86-64

rel.o: file format elf64-x86-64 26,29c26,29 < 40: eb 0e jmp 50 <_ZN1k1lEv+0x50> < 42: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) < 49: 00 00 00 < 4c: 0f 1f 40 00 nopl 0x0(%rax)

40: e9 0b 00 00 00 jmpq 50 <_ZN1k1lEv+0x50> 45: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) 4c: 00 00 00 4f: 90 nop

llvmbot commented 4 years ago

@Greg Bedwell Hi Greg, just feel the LLVM community is very nice and many seniors would take time to kindly help a beginner, especially you, thanks so much :)

gregbedwell commented 4 years ago

Thanks for working on this! Really happy to have you on board the LLVM project :)

llvmbot commented 4 years ago

closed by commit https://reviews.llvm.org/rGec32dff0b075055b30140c543e9f2bef608adc14

jmorse commented 5 years ago

Chris wrote:

After run test above, there are many "unnamed alloca" in test.mir. Not sure if it is sample code issue, rather then codegen issue?

I've experienced this in the past, I think it's something weird / broken with the MIR representation -- I've never gotten to the bottom of it.

Previously I've just fiddled with my test cases until they don't generate any allocas at all. Note that you don't necessarily need a MIR test input that comes straight from a C input: you can delete and modify the MIR until it stimulates the code path you're trying to test. That means you could delete anything to do with un-named allocas in the MIR output, alternately you could copy-and-edit an existing MIR test until it represents the behaviour in branch-folder that you're trying to fix.

llvmbot commented 5 years ago

when writing MIR test code for this patch, meet one error "unnamed alloca", I try to compile the code without any PASS, the issue is still exist. It looks like the sample code itself is not correct when handling allocator.

Steps to compile code without pass:

clang++ -g -w -O1 -S -emit-llvm PowerParser.ii.cc -mllvm -opt-bisect-limit=0 -o test.ll llc -stop-before=branch-folder test.ll -opt-bisect-limit=0 -o test.mir llc -o - test.mir -mtriple=x86_64-- -run-pass=branch-folder

error: test.mir:298:20: alloca instruction named '' isn't defined in the function '_ZN1k1lEv'

{ id: 0, name: '', type: default, offset: -48, size: 8,

After run test above, there are many "unnamed alloca" in test.mir. Not sure if it is sample code issue, rather then codegen issue?

llvmbot commented 5 years ago

Patch has been submitted to fix this issue: https://reviews.llvm.org/D66467

jmorse commented 5 years ago

Thanks for diagnosing -- that code definitely looks suspicious, the comment from line 378 even explains how problems could occur!

It's a little odd that most of ComputeCommonTailLength uses the nearby "countsAsInstruction" helper to skip over debug instructions, but the last two loops don't. It might be worth looking at the history / git blame a little, just to see if there's some other justification; but if your patch fixes the reproducer in this test, it's definitely worth submitting.

llvmbot commented 5 years ago

The issue seems caused by BranchFolderPass - "Control Flow Optimizer".

line:

while (I2 != MBB2->end() && I2->isCFIInstruction()) { https://github.com/llvm/llvm-project/blob/21599876be328ff6b5c6cf09544ade7e337cb48d/llvm/lib/CodeGen/BranchFolding.cpp#L403

While handling MBB2 instrs list below, DEBUG instrs should also be skipped when goto SkipTopCFIAndReturn. Begin Instr of the MBB is debug instruction, the second line instr is CFI, both debug and CFI should be skip when ComputeCommonTailLength. Otherwhise, the debug instr impace later Pass (MachineBlockReplacement) with "-g"

LLVM_DEBUG printf:

MBB2: bb.2 (%ir-block.9): ; predecessors: %bb.1, %bb.2, %bb.6 successors: %bb.2(0x40000000), %bb.4(0x40000000); %bb.2(50.00%), %bb.4(50.00%) liveins: $rbx, $r12, $r14, $r15 DBG_VALUE 0, $noreg, !"p", !DIExpression(), debug-location !76; PowerParser.cc:29:9 line no:29 CFI_INSTRUCTION , debug-location !77; PowerParser.cc:31:11 $rdi = COPY renamable $rbx, debug-location !77; PowerParser.cc:31:11

After change the code, just like below, the issue could fixed. while (I2 != MBB2->end() && (I2->isCFIInstruction() || I2->isDebugInstr())) { ++I2; }

If the analysis is correct, I would like to submit patch to fix the issue

llvmbot commented 5 years ago

Then, compare the -opt-bisect-limit= from max to min, find the difference is made by 133 - "Branch Probability Basic Block Placement on function (_ZN1k1lEv)", For "-g" the pass number is 137, not sure if it is correct debug method.

clang++ -mllvm -opt-bisect-limit=133 PowerParser.cc -c -O1 -o a.o objdump -d a.o > a.obj

clang++ -mllvm -opt-bisect-limit=137 PowerParser.cc -c -O1 -g -o ag.o objdump -d ag.o > ag.obj

baseline: 35: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) 3c: 00 00 00 3f: 90 nop 40: eb 0e jmp 50 <_ZN1k1lEv+0x50> 42: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) 49: 00 00 00 4c: 0f 1f 40 00 nopl 0x0(%rax)

with debug: 35: eb 09 jmp 40 <_ZN1k1lEv+0x40> 37: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1)

Bug from this assembly code, it is still not easy to find the problem code.

gregbedwell commented 5 years ago

One thing to note is that it is not necessarily the SROA pass that's actually causing the issue. Even though you're telling the compiler to not run any passes after SROA it's still having to do a bunch of work later on in order to actually emit the code.

It might just be the case that SROA is perfectly validly making some change to the code which happens to allow a code path later on containing the bug to be triggered which otherwise wouldn't have been.

llvmbot commented 5 years ago

more debug info： using "-mllvm -opt-bisect-limit=2 -c -O3", find the difference, seems SROA pass has some issue there. Continue to look inside SROA pass.

BISECT: running pass (1) Simplify the CFG on function (_ZN1k1lEv) BISECT: running pass (2) SROA on function (_ZN1k1lEv) BISECT: NOT running pass (3) Early CSE on function (_ZN1k1lEv)

< a.o: file format elf64-x86-64

b.o: file format elf64-x86-64 18,55c18,54 < 28: eb 00 jmp 2a <_ZN1k1lEv+0x2a> < 2a: 48 89 df mov %rbx,%rdi < 2d: e8 00 00 00 00 callq 32 <_ZN1k1lEv+0x32> < 32: a8 01 test $0x1,%al < 34: 0f 85 86 00 00 00 jne c0 <_ZN1k1lEv+0xc0> < 3a: eb 15 jmp 51 <_ZN1k1lEv+0x51> < 3c: 48 89 c3 mov %rax,%rbx < 3f: 48 8d 7c 24 20 lea 0x20(%rsp),%rdi < 44: e8 00 00 00 00 callq 49 <_ZN1k1lEv+0x49> < 49: 48 89 df mov %rbx,%rdi < 4c: e8 00 00 00 00 callq 51 <_ZN1k1lEv+0x51> < 51: 31 c0 xor %eax,%eax < 53: 48 89 de mov %rbx,%rsi < 56: 48 81 c6 08 00 00 00 add $0x8,%rsi < 5d: 48 89 da mov %rbx,%rdx < 60: 48 81 c2 0c 00 00 00 add $0xc,%rdx < 67: 44 8b 43 04 mov 0x4(%rbx),%r8d < 6b: 44 8b 0c 25 00 00 00 mov 0x0,%r9d < 72: 00 < 73: 8b 04 25 00 00 00 00 mov 0x0,%eax < 7a: 48 8d 7c 24 18 lea 0x18(%rsp),%rdi < 7f: 48 89 d9 mov %rbx,%rcx < 82: 89 04 24 mov %eax,(%rsp) < 85: c7 44 24 08 00 00 00 movl $0x0,0x8(%rsp) < 8c: 00 < 8d: e8 00 00 00 00 callq 92 <_ZN1k1lEv+0x92> < 92: 48 8d 7c 24 18 lea 0x18(%rsp),%rdi < 97: e8 00 00 00 00 callq 9c <_ZN1k1lEv+0x9c> < 9c: 48 bf 00 00 00 00 00 movabs $0x0,%rdi < a3: 00 00 00 < a6: e8 00 00 00 00 callq ab <_ZN1k1lEv+0xab> < ab: a8 01 test $0x1,%al < ad: 75 02 jne b1 <_ZN1k1lEv+0xb1> < af: eb 0a jmp bb <_ZN1k1lEv+0xbb> < b1: 48 8d 7c 24 18 lea 0x18(%rsp),%rdi < b6: e8 00 00 00 00 callq bb <_ZN1k1lEv+0xbb> < bb: e9 6a ff ff ff jmpq 2a <_ZN1k1lEv+0x2a> < c0: e9 63 ff ff ff jmpq 28 <_ZN1k1lEv+0x28>

28: 48 89 df mov %rbx,%rdi 2b: e8 00 00 00 00 callq 30 <_ZN1k1lEv+0x30> 30: a8 01 test $0x1,%al 32: 0f 85 86 00 00 00 jne be <_ZN1k1lEv+0xbe> 38: eb 15 jmp 4f <_ZN1k1lEv+0x4f> 3a: 48 89 c3 mov %rax,%rbx 3d: 48 8d 7c 24 20 lea 0x20(%rsp),%rdi 42: e8 00 00 00 00 callq 47 <_ZN1k1lEv+0x47> 47: 48 89 df mov %rbx,%rdi 4a: e8 00 00 00 00 callq 4f <_ZN1k1lEv+0x4f> 4f: 31 c0 xor %eax,%eax 51: 48 89 de mov %rbx,%rsi 54: 48 81 c6 08 00 00 00 add $0x8,%rsi 5b: 48 89 da mov %rbx,%rdx 5e: 48 81 c2 0c 00 00 00 add $0xc,%rdx 65: 44 8b 43 04 mov 0x4(%rbx),%r8d 69: 44 8b 0c 25 00 00 00 mov 0x0,%r9d 70: 00 71: 8b 04 25 00 00 00 00 mov 0x0,%eax 78: 48 8d 7c 24 18 lea 0x18(%rsp),%rdi 7d: 48 89 d9 mov %rbx,%rcx 80: 89 04 24 mov %eax,(%rsp) 83: c7 44 24 08 00 00 00 movl $0x0,0x8(%rsp) 8a: 00 8b: e8 00 00 00 00 callq 90 <_ZN1k1lEv+0x90> 90: 48 8d 7c 24 18 lea 0x18(%rsp),%rdi 95: e8 00 00 00 00 callq 9a <_ZN1k1lEv+0x9a> 9a: 48 bf 00 00 00 00 00 movabs $0x0,%rdi a1: 00 00 00 a4: e8 00 00 00 00 callq a9 <_ZN1k1lEv+0xa9> a9: a8 01 test $0x1,%al ab: 75 02 jne af <_ZN1k1lEv+0xaf> ad: eb 0a jmp b9 <_ZN1k1lEv+0xb9> af: 48 8d 7c 24 18 lea 0x18(%rsp),%rdi b4: e8 00 00 00 00 callq b9 <_ZN1k1lEv+0xb9> b9: e9 6a ff ff ff jmpq 28 <_ZN1k1lEv+0x28> be: e9 65 ff ff ff jmpq 28 <_ZN1k1lEv+0x28>

llvmbot commented 5 years ago

Not sure below research is helpful:

Compare the code with "-c -O3" and "-c -O3 -g", there are little different. ref to: https://godbolt.org/z/KeVedB

while compare the line 29:
e o = "";

the differences are: baseline: jmp 40 <k::l()+0x40> nop WORD PTR [rax+rax*1+0x0]

with debug: nop WORD PTR cs:[rax+rax*1+0x0] nop

I try to analyze deeply, but it's hard for me to find the code which impacted the "-O" and "-g" while compare "e o = """ to .ll, very appreciate if someone could give some suggests.

llvm / llvm-project