llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.03k stars 11.58k forks source link

Wrong debug info for step at -O1 #45385

Open llvmbot opened 4 years ago

llvmbot commented 4 years ago
Bugzilla Link 46040
Version trunk
OS Linux
Attachments the binary
Reporter LLVM Bugzilla Contributor
CC @dwblaikie,@JDevlieghere,@walkerkd,@OCHyams,@pogo59

Extended Description

$ clang --version clang version 11.0.0 (/home/yibiao/.cache/yay/llvm-git/llvm-project 871beba234a83a2a02da9dedbd59b91a1bfbd7af) Target: x86_64-pc-linux-gnu Thread model: posix InstalledDir: /usr/bin

$ lldb --version lldb version 11.0.0 clang revision 871beba234a83a2a02da9dedbd59b91a1bfbd7af llvm revision 871beba234a83a2a02da9dedbd59b91a1bfbd7af

$ clang -O1 -g small.c

$ lldb a.out (lldb) target create "a.out" Current executable set to '/home/yibiao/Debugger/a.out' (x86_64). (lldb) b main Breakpoint 1: where = a.out`main + 1 at small.c:19:3, address = 0x00000000004011d1 (lldb) r Process 35712 launched: '/home/yibiao/Debugger/a.out' (x86_64) Process 35712 stopped

/**** As showed, when step to line 11 (first hit), the value of "i" is equal to 1. while step-i to line 11 (first hit), the value of "i" is 0, which is as expected. ****/

$ clang -O1 -g small.c;lldb a.out (lldb) target create "a.out" Current executable set to '/home/yibiao/Debugger/a.out' (x86_64). (lldb) b main Breakpoint 1: where = a.out`main + 1 at small.c:19:3, address = 0x00000000004011d1 (lldb) r Process 35938 launched: '/home/yibiao/Debugger/a.out' (x86_64) Process 35938 stopped

OCHyams commented 4 years ago

Please ignore comment #​2, Bugzilla formatted my reply in unexpected ways and I don't seem to be able to edit or delete it. Here it is again:

Nice find! I get the feeling that the variable locations for 'i' are correct, but the line table is messed up. It looks like the prologue_end flag is on a misleading line, and we may have a misleading line number for the first instruction in the final for loop block.

$ cat -n test.c 1 #include 2 3 void f(int n, ...){ 4 va_list ap; 5 char end; 6 int i; 7 8 for(i=0; i<2; i++) { 9 va_start(ap, n); 10 while (1) { 11 end = va_arg(ap, char ); 12 if(!end) break; 13 } 14 va_end(ap); 15 } 16 } 17 18 int main() { 19 f(1); 20 }

Using clang from the 29th May 2020. $ clang --version clang version 11.0.0 (92063228f85bfe22a6dfe20bf01c99ffe6ff3130) Target: x86_64-unknown-linux-gnu

$ clang test.c -O1 -g $ llvm-dwarfdump a.out -name i 0x00000059: DW_TAG_variable DW_AT_location (0x00000000: [0x00000000004004ca, 0x00000000004004e0): DW_OP_consts +0, DW_OP_stack_value [0x00000000004004e0, 0x00000000004004e3): DW_OP_reg2 RCX [0x00000000004004e3, 0x00000000004004e7): DW_OP_reg1 RDX [0x00000000004004e7, 0x0000000000400539): DW_OP_reg2 RCX) DW_AT_name ("i")

$ llvm-dwarfdump --debug-line a.out Address Line Column File ISA Discriminator Flags


0x0000000000400480 3 0 1 0 0 is_stmt 0x00000000004004e0 8 18 1 0 0 is_stmt prologue_end 0x00000000004004e3 8 13 1 0 0 0x00000000004004e7 8 3 1 0 0 0x00000000004004e9 9 5 1 0 0 is_stmt ...

Using gdb, if you step with si (step to next machine instruction) into 'f' and keep going until you hit a line which is part of the for loop, you'll hit the following instruction. Using gdb, if you step with si (step to next machine instruction) into 'f' and ------------------------------------------------------------------------- | line table | disassembly | "i" | ------------------------------------------------------------------------- | ... | ... | undef | | 9 is_stmt | 0x4004e9 mov QWORD PTR [rsp-0x70],rax | RCX (0) | -------------------------------------------------------------------------

Then continuing round the loop, variable 'i' eventually increments as you'd expect.

If you instead step into 'f' with step (step to next source line), you start at the end of the prologue, according to the line table.

------------------------------------------------------------------------- | line table | disassembly | "i" | ------------------------------------------------------------------------- | 8 is_stmt prologue_end | 0x4004e0 lea edx,[rcx+0x1] | RCX (0) | | 8 | 0x4004e3 test ecx,ecx | RCX (0) | | 8 | 0x4004e5 mov ecx,edx | RCX (1) | | 8 | 0x4004e7 jne 0x400534 <f+180> | RCX (1) | ===== step ================================================================== | 9 is_stmt | 0x4004e9 mov QWORD PTR [rsp-0x70],rax | RCX (1) | -------------------------------------------------------------------------

From my initial look, I think there are two problems at play:

1) Looking at the source, you'd expect line 8 to be roughly where the prologue ends. However, AFAICT the instruction at 0x4004e0 comes from the final block of the outer loop. This means we essentially skip the first iteration of the loop when stepping through with 'step'.

2) After the MIR pass "Branch Probability Basic Block Placement" (-block-placement), the final for loop block is moved to near the top of the function. Before this block there 3 others including entry. None of the instructions in those other blocks have a DebugLoc, so the first line number we encounter comes from the final while block. I don't how the prologue_end is calculated but this set of circumstances looks suspicious.

Adding in Paul's reply so it is not hidden by this re-post.

2) After the MIR pass "Branch Probability Basic Block Placement" (-block-placement), the final for loop block is moved to near the top of the function. Before this block there 3 others including entry. None of the instructions in those other blocks have a DebugLoc, so the first line number we encounter comes from the final while block. I don't how the prologue_end is calculated but this set of circumstances looks suspicious.

prologue_end is the first real instruction that is not marked as FrameSetup and also has a DebugLoc. If there are instructions that are incorrectly missing a DebugLoc, fixing that should fix prologue_end placement.

pogo59 commented 4 years ago

2) After the MIR pass "Branch Probability Basic Block Placement" (-block-placement), the final for loop block is moved to near the top of the function. Before this block there 3 others including entry. None of the instructions in those other blocks have a DebugLoc, so the first line number we encounter comes from the final while block. I don't how the prologue_end is calculated but this set of circumstances looks suspicious.

prologue_end is the first real instruction that is not marked as FrameSetup and also has a DebugLoc. If there are instructions that are incorrectly missing a DebugLoc, fixing that should fix prologue_end placement.

OCHyams commented 4 years ago

Nice find! I get the feeling that the variable locations for 'i' are correct, but the line table is messed up. It looks like the prologue_end flag is on a misleading line, and we may have a misleading line number for the first instruction in the final for loop block.

$ cat -n test.c 1 #include 2 3 void f(int n, ...){ 4 va_list ap; 5 char end; 6 int i; 7 8 for(i=0; i<2; i++) { 9 va_start(ap, n); 10 while (1) { 11 end = va_arg(ap, char ); 12 if(!end) break; 13 } 14 va_end(ap); 15 } 16 } 17 18 int main() { 19 f(1); 20 }

Using clang from the 29th May 2020. $ clang --version clang version 11.0.0 (92063228f85bfe22a6dfe20bf01c99ffe6ff3130) Target: x86_64-unknown-linux-gnu

$ clang test.c -O1 -g $ llvm-dwarfdump a.out -name i 0x00000059: DW_TAG_variable DW_AT_location (0x00000000: [0x00000000004004ca, 0x00000000004004e0): DW_OP_consts +0, DW_OP_stack_value [0x00000000004004e0, 0x00000000004004e3): DW_OP_reg2 RCX [0x00000000004004e3, 0x00000000004004e7): DW_OP_reg1 RDX [0x00000000004004e7, 0x0000000000400539): DW_OP_reg2 RCX) DW_AT_name ("i")

$ llvm-dwarfdump --debug-line a.out Address Line Column File ISA Discriminator Flags


0x0000000000400480 3 0 1 0 0 is_stmt 0x00000000004004e0 8 18 1 0 0 is_stmt prologue_end 0x00000000004004e3 8 13 1 0 0 0x00000000004004e7 8 3 1 0 0 0x00000000004004e9 9 5 1 0 0 is_stmt ...

Using gdb, if you step with si (step to next machine instruction) into 'f' and keep going until you hit a line which is part of the for loop, you'll hit the following instruction. a

--------------------------------------------------------------------------------------------------------------------- | line table | disassembly | location for "i" (+ current value) | --------------------------------------------------------------------------------------------------------------------- | ... | ... | undef | | 9 is_stmt | 0x4004e9 <f+105> mov QWORD PTR [rsp-0x70],rax | RCX (0) | ---------------------------------------------------------------------------------------------------------------------

Then continuing round the loop, variable 'i' eventually increments as you'd expect.

If you instead step into 'f' with step (step to next source line), you start at the end of the prologue, according to the line table.

--------------------------------------------------------------------------------------------------------------------- | line table | disassembly | location for "i" (+ current value) | --------------------------------------------------------------------------------------------------------------------- | 8 is_stmt prologue_end | 0x4004e0 <f+96> lea edx,[rcx+0x1] | RCX (0) | | 8 | 0x4004e3 <f+99> test ecx,ecx | RCX (0) | | 8 | 0x4004e5 <f+101> mov ecx,edx | RCX (1) | | 8 | 0x4004e7 <f+103> jne 0x400534 <f+180> | RCX (1) | ==== step ============================================================================================================= | 9 | 0x4004e9 <f+105> mov QWORD PTR [rsp-0x70],rax | RCX (1) | ---------------------------------------------------------------------------------------------------------------------

From my initial look, I think there are two problems at play:

1) Looking at the source, you'd expect line 8 to be roughly where the prologue ends. However, AFAICT the instruction at 0x4004e0 comes from the final block of the outer loop. This means we essentially skip the first iteration of the loop when stepping through with 'step'.

2) After the MIR pass "Branch Probability Basic Block Placement" (-block-placement), the final for loop block is moved to near the top of the function. Before this block there 3 others including entry. None of the instructions in those other blocks have a DebugLoc, so the first line number we encounter comes from the final while block. I don't how the prologue_end is calculated but this set of circumstances looks suspicious.

llvmbot commented 4 years ago

Sorry. Forgot to attach the code.

$ cat small.c

include

void f(int n, ...){
va_list ap; char *end; int i;

for(i=0; i<2; i++) { va_start(ap, n); while (1) { end = va_arg(ap, char *); if(!end) break; } va_end(ap); } }

int main() { f(1); }