wolf2009 / google-breakpad

Automatically exported from code.google.com/p/google-breakpad
0 stars 0 forks source link

Linux stack walker does not know how to unwind through a trampoline #476

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Build Linux Breakpad, then take the attached program / header file and build it:

g++ -g example.cc -I path/to/breakpad/src 
path/to/breakpad/src/client/linux/libbreakpad_client.a -lpthread

Run the program on a Linux distro with seccomp filtering enabled, like Ubuntu 
12.04. The program should crash and make a Breakpad crash dump.

Breakpad's minidump_stackwalk doesn't know how to walk past the trampoline:

Thread 0 (crashed)
 0  seccomp_ex!install_sigsys_handler [example.cc : 98 + 0x7]
    ...
    Found by: given as instruction pointer in context
 1  libpthread-2.15.so + 0xfcaf
    ...
    Found by: call frame info

whereas if we convert the Breakpad dump to core using minidump-2-core, gdb 
knows how to do it:

#0  install_sigsys_handler ...
#1  <signal handler called>
#2 ...

Original issue reported on code.google.com by thestig@chromium.org on 21 Apr 2012 at 3:51

Attachments:

GoogleCodeExporter commented 9 years ago
Updated example program for r1031

Original comment by thestig@chromium.org on 8 Sep 2012 at 2:16

Attachments:

GoogleCodeExporter commented 9 years ago
Just repro'd this with breakpad r1375, gdb 7.8-gg1, linux 3.13.0-35-generic on 
Ubuntu 14.04. I'll try to dig into how gdb is able to unwind past the signal 
handler and why breakpad's getting stuck on it.

Original comment by mdemp...@chromium.org on 19 Sep 2014 at 11:25

GoogleCodeExporter commented 9 years ago
Brain dumping some info here; mostly for my own reference, and maybe useful to 
others.

Linux's kernel_sigaction struct contains an sa_restorer field, which the kernel 
uses to setup the return address for a signal handler call frame. glibc's 
sigaction() sets this to __restore_rt, which is in libpthread.so (hence why 
breakpad unwinds there before giving up). The restorer code is responsible for 
calling sigreturn().

gdb has two mechanisms for detecting signal frames: 1) if the DWARF CIE 
augmentation string contains the 'S' character, or 2) if it unwinds and finds a 
function named "__restore_rt", or if it finds a function named "sigaction" and 
the next instructions are "mov $__NR_rt_sigreturn, %rax; syscall". (See 
gdb/amd64-linux-tdep.c.)

However, the signal frame detection only seems to affect whether unwinding 
should return to PC or to PC-1, and I think either should generally be okay.

stackwalker_amd64.cc (and probably others) has a check "If the new stack 
pointer is at a lower address than the old, then that's clearly incorrect."  
But I don't think that's necessarily true if we use sigaltstack() and the 
signal stack is at a higher address in memory (i.e., then unwinding would jump 
to a lower address); I think this happens to work currently because in Chrome 
only the main thread in a process (whose stack will always(?) be allocated 
high) uses an alternate stack, whereas additional threads (whose stacks are 
dynamically allocated) handle their signals on the main stack.

Running dump_syms on libpthread-2.19.so gives a bunch of warnings like:

/lib/x86_64-linux-gnu/libpthread-2.19.so, section '.eh_frame': the call frame 
entry at offset 0x3130 uses a DWARF expression to describe how to recover 
register '.ra',  but this translator cannot yet translate DWARF expressions to 
Breakpad postfix expressions
/lib/x86_64-linux-gnu/libpthread-2.19.so, section '.eh_frame': the call frame 
entry at offset 0x3130 uses a DWARF expression to describe how to recover 
register '.ra',  but this translator cannot yet translate DWARF expressions to 
Breakpad postfix expressions
/lib/x86_64-linux-gnu/libpthread-2.19.so, section '.eh_frame': the call frame 
entry at offset 0x31a8 uses a DWARF expression to describe how to recover 
register '.cfa',  but this translator cannot yet translate DWARF expressions to 
Breakpad postfix expressions
/lib/x86_64-linux-gnu/libpthread-2.19.so, section '.eh_frame': the call frame 
entry at offset 0x31a8 uses a DWARF expression to describe how to recover 
register '$r8',  but this translator cannot yet translate DWARF expressions to 
Breakpad postfix expressions
/lib/x86_64-linux-gnu/libpthread-2.19.so, section '.eh_frame': the call frame 
entry at offset 0x31a8 uses a DWARF expression to describe how to recover 
register '$r9',  but this translator cannot yet translate DWARF expressions to 
Breakpad postfix expressions
/lib/x86_64-linux-gnu/libpthread-2.19.so, section '.eh_frame': the call frame 
entry at offset 0x31a8 uses a DWARF expression to describe how to recover 
register '$r10',  but this translator cannot yet translate DWARF expressions to 
Breakpad postfix expressions

Also, I get different results if I run dump_syms on 
/lib/x86_64-linux-gnu/libpthread-2.19.so vs 
/usr/lib/debug/lib/x86_64-linux-gnu/libpthread-2.19.so. It probably needs to be 
made aware of .gnu_debuglink?

Original comment by mdemp...@chromium.org on 20 Sep 2014 at 2:32

GoogleCodeExporter commented 9 years ago
For dump_syms, you need to run "dump_syms 
/lib/x86_64-linux-gnu/libpthread-2.19.so /usr/lib/debug/lib/x86_64-linux-gnu" 
for it to pick up the .gnu_debuglink.

Original comment by thestig@chromium.org on 20 Sep 2014 at 2:37

GoogleCodeExporter commented 9 years ago
Oops, dump_syms already supports gnu_debuglink, I just needed to explicitly 
tell it where to find the debug version of the file.

Original comment by mdemp...@chromium.org on 20 Sep 2014 at 2:37

GoogleCodeExporter commented 9 years ago
Looking closer at the DWARF expressions that are failing, most of them (and in 
particular all of the ones that affect __restore_rt) are simple one or two 
instruction programs. Relevant excerpt from "readelf --dump-debug=frames 
/lib/x86_64-linux-gnu/libpthread-2.19.so":

000031a8 000000000000007c 0000001c FDE cie=00003190 
pc=000000000001033f..0000000000010349
  DW_CFA_def_cfa_expression (DW_OP_breg7 (rsp): 160; DW_OP_deref)
  DW_CFA_expression: r8 (r8) (DW_OP_breg7 (rsp): 40)
  DW_CFA_expression: r9 (r9) (DW_OP_breg7 (rsp): 48)
  DW_CFA_expression: r10 (r10) (DW_OP_breg7 (rsp): 56)
  DW_CFA_expression: r11 (r11) (DW_OP_breg7 (rsp): 64)
  DW_CFA_expression: r12 (r12) (DW_OP_breg7 (rsp): 72)
  DW_CFA_expression: r13 (r13) (DW_OP_breg7 (rsp): 80)
  DW_CFA_expression: r14 (r14) (DW_OP_breg7 (rsp): 88)
  DW_CFA_expression: r15 (r15) (DW_OP_breg7 (rsp): 96)
  DW_CFA_expression: r5 (rdi) (DW_OP_breg7 (rsp): 104)
  DW_CFA_expression: r4 (rsi) (DW_OP_breg7 (rsp): 112)
  DW_CFA_expression: r6 (rbp) (DW_OP_breg7 (rsp): 120)
  DW_CFA_expression: r3 (rbx) (DW_OP_breg7 (rsp): 128)
  DW_CFA_expression: r1 (rdx) (DW_OP_breg7 (rsp): 136)
  DW_CFA_expression: r0 (rax) (DW_OP_breg7 (rsp): 144)
  DW_CFA_expression: r2 (rcx) (DW_OP_breg7 (rsp): 152)
  DW_CFA_expression: r7 (rsp) (DW_OP_breg7 (rsp): 160)
  DW_CFA_expression: r16 (rip) (DW_OP_breg7 (rsp): 168)

So I think we could pretty easily recognize these handful of limited 
instruction patterns and be able to generate

   .cfa: $rsp 160 + ^
   $r8: $rsp 40 +
   $r9: $rsp 48 +
   ...

for the breakpad syms files.  If that sounds right, I'll work on a CL.

Original comment by mdemp...@chromium.org on 20 Sep 2014 at 3:15

GoogleCodeExporter commented 9 years ago
I hacked together something to recognize DW_OP_bregN optionally followed by 
DW_OP_deref, and now running dump_syms on libpthread-2.19.so, I get this entry 
for __restore_rt:

STACK CFI INIT 1033f a $r10: $rsp 56 + ^ $r11: $rsp 64 + ^ $r12: $rsp 72 + ^ 
$r13: $rsp 80 + ^ $r14: $rsp 88 + ^ $r15: $rsp 96 + ^ $r8: $rsp 40 + ^ $r9: 
$rsp 48 + ^ $rax: $rsp 144 + ^ $rbp: $rsp 120 + ^ $rbx: $rsp 128 + ^ $rcx: $rsp 
152 + ^ $rdi: $rsp 104 + ^ $rdx: $rsp 136 + ^ $rsi: $rsp 112 + ^ $rsp: $rsp 160 
+ ^ .cfa: $rsp 160 + ^ .ra: $rsp 168 + ^

and when I run the sample program and then run minidump_stackwalk, I get a full 
stack trace past the signal handler frame:

 0  breakpad_signal + 0x2112
    rax = 0x000000000000002a   rdx = 0x00007fff15c71000
    rcx = 0x000000000000002a   rbx = 0x00007fff15c715c0
    rsi = 0x00007fff15c71130   rdi = 0x0000000000000001
    rbp = 0x00007fff15c70ff0   rsp = 0x00007fff15c70fd0
     r8 = 0x00007fff15c716c0    r9 = 0x0000000000000000
    r10 = 0x0000000000000008   r11 = 0x0000000000000246
    r12 = 0x00007fff15c71640   r13 = 0x00007fff15c71980
    r14 = 0x0000000000000000   r15 = 0x0000000000000000
    rip = 0x0000000000402112
    Found by: given as instruction pointer in context
 1  libpthread-2.19.so + 0x10340
    rbx = 0x00007fff15c715c0   rbp = 0x00000000ffffffff
    rsp = 0x00007fff15c71000   r12 = 0x00007fff15c71640
    r13 = 0x00007fff15c71980   r14 = 0x0000000000000000
    r15 = 0x0000000000000000   rip = 0x00007f6d7b924340
    Found by: call frame info
 2  libc-2.19.so + 0xc19a0
    rax = 0x0000000000000023   rdx = 0x0000000000000000
    rcx = 0xffffffffffffffff   rbx = 0x00007fff15c715c0
    rsi = 0x00007fff15c715b0   rdi = 0x00007fff15c715b0
    rbp = 0x00000000ffffffff   rsp = 0x00007fff15c715a8
     r8 = 0x00007fff15c716c0    r9 = 0x0000000000000000
    r10 = 0x0000000000000008   r11 = 0x0000000000000246
    r12 = 0x00007fff15c71640   r13 = 0x00007fff15c71980
    r14 = 0x0000000000000000   r15 = 0x0000000000000000
    rip = 0x00007f6d7b60f9a0
    Found by: call frame info
 3  libc-2.19.so!__sleep [sleep.c : 137 + 0xb]
    rbx = 0x00007fff15c715c0   rbp = 0x00000000ffffffff
    rsp = 0x00007fff15c715b0   r12 = 0x00007fff15c71640
    r13 = 0x00007fff15c71980   r14 = 0x0000000000000000
    r15 = 0x0000000000000000   rip = 0x00007f6d7b60f854
    Found by: call frame info
 4  breakpad_signal + 0x233c
    rbx = 0x0000000000000000   rbp = 0x00007fff15c718a0
    rsp = 0x00007fff15c71790   r12 = 0x0000000000401fc0
    r13 = 0x00007fff15c71980   r14 = 0x0000000000000000
    r15 = 0x0000000000000000   rip = 0x000000000040233c
    Found by: call frame info
 5  libc-2.19.so!__libc_start_main [libc-start.c : 287 + 0x1a]
    rbx = 0x0000000000000000   rbp = 0x0000000000000000
    rsp = 0x00007fff15c718b0   r12 = 0x0000000000401fc0
    r13 = 0x00007fff15c71980   r14 = 0x0000000000000000
    r15 = 0x0000000000000000   rip = 0x00007f6d7b56fec5
    Found by: call frame info
 6  breakpad_signal + 0x1fe9
    rbx = 0x0000000000000000   rbp = 0x0000000000000000
    rsp = 0x00007fff15c71970   r12 = 0x0000000000401fc0
    r13 = 0x00007fff15c71980   r14 = 0x0000000000000000
    r15 = 0x0000000000000000   rip = 0x0000000000401fe9
    Found by: call frame info
 7  0x7fff15c71978
    rbx = 0x0000000000000000   rbp = 0x0000000000000000
    rsp = 0x00007fff15c71978   r12 = 0x0000000000401fc0
    r13 = 0x00007fff15c71980   r14 = 0x0000000000000000
    r15 = 0x0000000000000000   rip = 0x00007fff15c71978
    Found by: call frame info

So I'll work on cleaning that up and then mail a CL.

Original comment by mdemp...@google.com on 20 Sep 2014 at 8:17