llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.13k stars 11.62k forks source link

Inconsistent debug line number info for inline functions #27497

Open llvmbot opened 8 years ago

llvmbot commented 8 years ago
Bugzilla Link 27123
Version trunk
OS All
Reporter LLVM Bugzilla Contributor
CC @asl,@dwblaikie,@jmorse,@pogo59,@wjristow

Extended Description

Here we have an example where there is a small function that is called 3 times, and each of those calls are inlined when compiled at -O2.

There are two versions of this test-case, a.cpp and b.cpp, that are identical except for one line in the function that is inlined: Line 10 “printf("Hello World\n");” being commented-out in b.cpp.

As we shall see, not having this printf will lead us to hit the breakpoint at Line 11 “target::Sleep(time);” for only one of the 3 inlined instances, rather than all 3.

$ cat a.cpp

include

namespace target { void Sleep(unsigned int usec); }

void inlinefunction(int time) { printf("Hello World\n"); target::Sleep(time); }

int main() { inlinefunction(100000); inlinefunction(200000); inlinefunction(500000); }

$ cat b.cpp

include

namespace target { void Sleep(unsigned int usec); }

void inlinefunction(int time) { // printf("Hello World\n"); target::Sleep(time); }

int main() { inlinefunction(100000); inlinefunction(200000); inlinefunction(500000); }

Here we have sleep.cpp which defines target::sleep().

$ cat sleep.cpp

include

namespace target { void Sleep(unsigned int usec) { usleep(usec); } }

When we compile a.cpp with sleep.cpp with “–g –O2” and run the debugger with a breakpoint on Line 11 “target::Sleep(time);”. The debugger will stop 3 times.

$ clang a.cpp sleep.cpp -g -O2 -Wall -o a_print.out $ dbg a_print.out

(gdb) b 11 Breakpoint 1 at 0x40061d: /home/chli/test/139525/a.cpp:11. (4 locations)

(gdb) run Starting program: /home/chli/test/139525/a_print.out Hello World

Breakpoint 1, inlinefunction (time=100000) at a.cpp:11 11 target::Sleep(time);

(gdb) c Continuing. Hello World

Breakpoint 1, inlinefunction (time=200000) at a.cpp:11 11 target::Sleep(time);

(gdb) c Continuing. Hello World

Breakpoint 1, inlinefunction (time=500000) at a.cpp:11 11 target::Sleep(time);

(gdb) c Continuing. [Inferior 1 (process 17993) exited normally]

When we compile b.cpp with sleep.cpp with “–g –O2” and run the debugger with a breakpoint on Line 11 “target::Sleep(time);”. Debugger will only stop once.

$ clang b.cpp sleep.cpp -g -O2 -Wall -o b_noprint.out $ gdb b_noprint.out

(gdb) b 11 Breakpoint 1 at 0x4005d0: /home/chli/test/139525/b.cpp:11. (2 locations)

(gdb) run Starting program: /home/chli/test/139525/b_noprint.out

Breakpoint 1, main () at b.cpp:16 16 inlinefunction(100000);

(gdb) c Continuing. [Inferior 1 (process 18009) exited normally]

Comparing the DWARF, we see differences in debug_info and debug_line

debug_info Section:

a_print.out LOCAL_SYMBOLS: < 1><0x0000002a> DW_TAG_subprogram DW_AT_low_pc 0x00400610 DW_AT_high_pc 21 DW_AT_APPLE_omit_frame_ptr yes(1) DW_AT_frame_base len 0x0001: 57: DW_OP_reg7 DW_AT_abstract_origin <0x00000047> < 2><0x0000003d> DW_TAG_formal_parameter DW_AT_location <loclist with 3 entries follows> [ 0]DW_OP_reg5 DW_OP_piece 4 [ 1]DW_OP_reg3 DW_OP_piece 4 [ 2]DW_OP_reg5 DW_OP_piece 4 DW_AT_abstract_origin <0x00000053>

b_noprint.out (.debug_info section) LOCAL_SYMBOLS: < 1><0x0000002a> DW_TAG_subprogram DW_AT_low_pc 0x004005d0 DW_AT_high_pc 5 DW_AT_APPLE_omit_frame_ptr yes(1) DW_AT_frame_base len 0x0001: 57: DW_OP_reg7 DW_AT_abstract_origin <0x00000047> < 2><0x0000003d> DW_TAG_formal_parameter DW_AT_location len 0x0003: 559304: DW_OP_reg5 DW_OP_piece 4 DW_AT_abstract_origin <0x00000053>

debug_line Section:

a_print.out .debug_line: line number info for a single cu Source lines (from CU-DIE at .debug_info offset 0x0000000b):

        NS new statement, BB new basic block, ET end of text sequence
        PE prologue end, EB epilogue begin
        IA=val ISA number, DI=val discriminator value
[row,col] NS BB ET PE EB IS= DI= uri: "filepath" 0x00400610 [ 9, 0] NS uri: "/home/chli/test/139525/a.cpp" 0x00400613 [ 10, 3] NS PE 0x0040061d [ 11, 3] NS 0x00400630 [ 15, 0] NS 0x00400630 [ 10, 3] NS PE 0x0040063b [ 11, 3] NS 0x00400645 [ 10, 3] NS 0x0040064f [ 11, 3] NS 0x00400659 [ 10, 3] NS 0x00400663 [ 11, 3] NS 0x0040066d [ 19, 1] NS 0x00400671 [ 19, 1] NS ET b_noprint.out .debug_line: line number info for a single cu Source lines (from CU-DIE at .debug_info offset 0x0000000b): NS new statement, BB new basic block, ET end of text sequence PE prologue end, EB epilogue begin IA=val ISA number, DI=val discriminator value [row,col] NS BB ET PE EB IS= DI= uri: "filepath" 0x004005d0 [ 9, 0] NS uri: "/home/chli/test/139525/b.cpp" 0x004005d0 [ 11, 3] NS PE 0x004005e0 [ 15, 0] NS 0x004005e0 [ 11, 3] NS PE 0x004005eb [ 11, 3] 0x004005f5 [ 11, 3] 0x004005ff [ 19, 1] NS 0x00400603 [ 19, 1] NS ET Most importantly, we see the last 2 occurrence of Line 11 above (0x004005eb and 0x004005f5) are not marked as New Statements. This is most likely the cause of the missed breakpoint.
dwblaikie commented 8 years ago

What struck me is that the example where the inlined function has two statements gets it right,

The difference with two statements is that the line table alternates location:

10 11 10 11 10 11

So each instance of 11 is a new region in the line table (we fold everything with the same location into the same region - describing the places where the line/file changes only - GCC seems to do the same, so far as I can tell)

You'd still get similar problems if the first and last instruction of the inlined code happened to be from the same line (eg: we reordered something from the last line into the top of the function - then you'd get that bleed through where the tail end of one inlining would be part of the next)

but a one-statement function does not. Probably because we rely on source location to tell us when to mark a new statement. It's a pretty good heuristic but clearly imperfect. Maybe consider the inlined-at location too?

Perhaps - but it would address only a somewhat arbitrary subset of cases, it seems. If this matters to someone, it would seem it would matter in the non-inlining case too.

[Note that the inlining info is still in tact (in my example there are 3 inlined subroutines, each describing separate instruction ranges for the 3 calls to f1) - so symbolication will still tell you which inlined call you're in]

If we implement two level line tables, I imagine we'll get this subset of fixes for free - but that's a fair bit of work, pushing all the inlining info awareness down into MC/assembler so it can generate the right line table.

pogo59 commented 8 years ago

What struck me is that the example where the inlined function has two statements gets it right, but a one-statement function does not. Probably because we rely on source location to tell us when to mark a new statement. It's a pretty good heuristic but clearly imperfect. Maybe consider the inlined-at location too?

dwblaikie commented 8 years ago

Simpler example:

void f1() {} inline attribute((always_inline)) void f2() { f1(); } int main() { f2(); f2(); f2(); }

Break 3 (the call to f1) only fires once.

But this is somewhat equivalent to this:

int main() { f1(); f1(); f1(); }

break on that line and you'll only break once even though there are 3 statements there.

GCC's DWARF emission has the same behavior, so this isn't a regression against GCC & I'm not sure anyone gets this right. I've no idea if GDB respects the IS_STMT flag if it is emitted (we've seen bugs with GDB's handling of it in the past) in the way that's suggested here.

It'll just be difficult/expensive to do - probably introducing something like a statement scope so that we can properly bound each statement. Could/would add a fair bit of extra metadata.

llvmbot commented 2 months ago

@llvm/issue-subscribers-debuginfo

Author: None (llvmbot)

| | | | --- | --- | | Bugzilla Link | [27123](https://llvm.org/bz27123) | | Version | trunk | | OS | All | | Reporter | LLVM Bugzilla Contributor | | CC | @asl,@dwblaikie,@jmorse,@pogo59,@wjristow | ## Extended Description Here we have an example where there is a small function that is called 3 times, and each of those calls are inlined when compiled at -O2. There are two versions of this test-case, a.cpp and b.cpp, that are identical except for one line in the function that is inlined: Line 10 “printf("Hello World\n");” being commented-out in b.cpp. As we shall see, not having this printf will lead us to hit the breakpoint at Line 11 “target::Sleep(time);” for only one of the 3 inlined instances, rather than all 3. $ cat a.cpp #include <stdio.h> namespace target { void Sleep(unsigned int usec); } void inlinefunction(int time) { printf("Hello World\n"); target::Sleep(time); } int main() { inlinefunction(100000); inlinefunction(200000); inlinefunction(500000); } $ cat b.cpp #include <stdio.h> namespace target { void Sleep(unsigned int usec); } void inlinefunction(int time) { // printf("Hello World\n"); target::Sleep(time); } int main() { inlinefunction(100000); inlinefunction(200000); inlinefunction(500000); } Here we have sleep.cpp which defines target::sleep(). $ cat sleep.cpp #include <unistd.h> namespace target { void Sleep(unsigned int usec) { usleep(usec); } } When we compile a.cpp with sleep.cpp with “–g –O2” and run the debugger with a breakpoint on Line 11 “target::Sleep(time);”. The debugger will stop 3 times. $ clang a.cpp sleep.cpp -g -O2 -Wall -o a_print.out $ dbg a_print.out (gdb) b 11 Breakpoint 1 at 0x40061d: /home/chli/test/139525/a.cpp:11. (4 locations) (gdb) run Starting program: /home/chli/test/139525/a_print.out Hello World Breakpoint 1, inlinefunction (time=100000) at a.cpp:11 11 target::Sleep(time); (gdb) c Continuing. Hello World Breakpoint 1, inlinefunction (time=200000) at a.cpp:11 11 target::Sleep(time); (gdb) c Continuing. Hello World Breakpoint 1, inlinefunction (time=500000) at a.cpp:11 11 target::Sleep(time); (gdb) c Continuing. [Inferior 1 (process 17993) exited normally] When we compile b.cpp with sleep.cpp with “–g –O2” and run the debugger with a breakpoint on Line 11 “target::Sleep(time);”. Debugger will only stop once. $ clang b.cpp sleep.cpp -g -O2 -Wall -o b_noprint.out $ gdb b_noprint.out (gdb) b 11 Breakpoint 1 at 0x4005d0: /home/chli/test/139525/b.cpp:11. (2 locations) (gdb) run Starting program: /home/chli/test/139525/b_noprint.out Breakpoint 1, main () at b.cpp:16 16 inlinefunction(100000); (gdb) c Continuing. [Inferior 1 (process 18009) exited normally] Comparing the DWARF, we see differences in debug_info and debug_line debug_info Section: a_print.out LOCAL_SYMBOLS: < 1><0x0000002a> DW_TAG_subprogram DW_AT_low_pc 0x00400610 DW_AT_high_pc <offset-from-lowpc>21 DW_AT_APPLE_omit_frame_ptr yes(1) DW_AT_frame_base len 0x0001: 57: DW_OP_reg7 DW_AT_abstract_origin <0x00000047> < 2><0x0000003d> DW_TAG_formal_parameter DW_AT_location <loclist with 3 entries follows> [ 0]<lowpc=0x00000000><highpc=0x00000003>DW_OP_reg5 DW_OP_piece 4 [ 1]<lowpc=0x00000003><highpc=0x0000000f>DW_OP_reg3 DW_OP_piece 4 [ 2]<lowpc=0x0000000f><highpc=0x00000015>DW_OP_reg5 DW_OP_piece 4 DW_AT_abstract_origin <0x00000053> b_noprint.out (.debug_info section) LOCAL_SYMBOLS: < 1><0x0000002a> DW_TAG_subprogram DW_AT_low_pc 0x004005d0 DW_AT_high_pc <offset-from-lowpc>5 DW_AT_APPLE_omit_frame_ptr yes(1) DW_AT_frame_base len 0x0001: 57: DW_OP_reg7 DW_AT_abstract_origin <0x00000047> < 2><0x0000003d> DW_TAG_formal_parameter DW_AT_location len 0x0003: 559304: DW_OP_reg5 DW_OP_piece 4 DW_AT_abstract_origin <0x00000053> debug_line Section: a_print.out .debug_line: line number info for a single cu Source lines (from CU-DIE at .debug_info offset 0x0000000b): NS new statement, BB new basic block, ET end of text sequence PE prologue end, EB epilogue begin IA=val ISA number, DI=val discriminator value <pc> [row,col] NS BB ET PE EB IS= DI= uri: "filepath" 0x00400610 [ 9, 0] NS uri: "/home/chli/test/139525/a.cpp" 0x00400613 [ 10, 3] NS PE 0x0040061d [ 11, 3] NS 0x00400630 [ 15, 0] NS 0x00400630 [ 10, 3] NS PE 0x0040063b [ 11, 3] NS 0x00400645 [ 10, 3] NS 0x0040064f [ 11, 3] NS 0x00400659 [ 10, 3] NS 0x00400663 [ 11, 3] NS 0x0040066d [ 19, 1] NS 0x00400671 [ 19, 1] NS ET b_noprint.out .debug_line: line number info for a single cu Source lines (from CU-DIE at .debug_info offset 0x0000000b): NS new statement, BB new basic block, ET end of text sequence PE prologue end, EB epilogue begin IA=val ISA number, DI=val discriminator value <pc> [row,col] NS BB ET PE EB IS= DI= uri: "filepath" 0x004005d0 [ 9, 0] NS uri: "/home/chli/test/139525/b.cpp" 0x004005d0 [ 11, 3] NS PE 0x004005e0 [ 15, 0] NS 0x004005e0 [ 11, 3] NS PE 0x004005eb [ 11, 3] 0x004005f5 [ 11, 3] 0x004005ff [ 19, 1] NS 0x00400603 [ 19, 1] NS ET Most importantly, we see the last 2 occurrence of Line 11 above (0x004005eb and 0x004005f5) are not marked as New Statements. This is most likely the cause of the missed breakpoint.