malor / cpython-lldb

LLDB extension for debugging Python programs
MIT License
80 stars 4 forks source link

[CI] Tests won't pass on LLDB 12+ #55

Open malor opened 2 years ago

malor commented 2 years ago

Somehow, LLDB versions 12 and newer downloaded from https://apt.llvm.org/ do not work correctly with the official Python Docker images: they fail to produce a valid stack trace after hitting an arbitrary breakpoint in libpython. From what I can tell, the problem is not with libpython and its debugging symbols, but rather with LLDB erroneously switching to the x86_64 default unwind plan UnwindPlan from the expected eh_frame CFI UnwindPlan.

E.g. this is the output I see when trying to get a CPython-level stack trace for the following snippet of code with a breakpoint set at builtin_abs():

def f():
    g()

def g():
    abs(42)

f()

I believe the critical bit is here where LLDB gets confused when loading DWARF debugging symbols from /lib64/ld-linux-x86-64.so.2; after that it no longer recognizes the symbol from the dynamic linker shared object and resorts to the default stack unwind method:

intern-state     th1/fr0 CFA is 0x7fffffffe7b0: Register rsp (7) contents are 0x7fffffffe7a8, offset is 8
intern-state     th1/fr0 initialized frame current pc is 0x7ffff7fe2590 cfa is 0x7fffffffe7b0 afa is 0xffffffffffffffff using assembly insn profiling UnwindPlan
intern-state     th1/fr0 supplying caller's saved rip (16)'s location using assembly insn profiling UnwindPlan
intern-state     th1/fr0 supplying caller's register rip (16) from the stack, saved at CFA plus offset -8 [saved at 0x7fffffffe7a8]
intern-state      th1/fr1 pc = 0x7ffff7fd5ec7
intern-state     th1/fr0 supplying caller's register rbp (6) from the live RegisterContext at frame 0
intern-state      th1/fr1 fp = 0x7fffffffea40
intern-state     th1/fr0 supplying caller's saved rsp (7)'s location using assembly insn profiling UnwindPlan
intern-state     th1/fr0 supplying caller's register rsp (7), value is CFA plus offset 0 [value is 0x7fffffffe7b0]
intern-state      th1/fr1 sp = 0x7fffffffe7b0
intern-state      th1/fr1 with pc value of 0x7ffff7fd5ec7, symbol name is '___lldb_unnamed_symbol22$$ld-2.31.so'
intern-state      th1/fr1 active row: 0x00007ffff7fd450c: CFA=rbp+16 => rbx=[CFA-56] rbp=[CFA-16] r12=[CFA-48] r13=[CFA-40] r14=[CFA-32] r15=[CFA-24] rip=[CFA-8]

intern-state     th1/fr0 supplying caller's saved rbp (6)'s location, cached
intern-state      th1/fr1 CFA is 0x7fffffffea50: Register rbp (6) contents are 0x7fffffffea40, offset is 16
intern-state      th1/fr1 m_cfa = 0x7fffffffea50 m_afa = 0xffffffffffffffff
intern-state      th1/fr1 initialized frame current pc is 0x7ffff7fd5ec7 cfa is 0x7fffffffea50 afa is 0xffffffffffffffff
intern-state     th1/fr0 supplying caller's saved rip (16)'s location, cached
intern-state      th1/fr1 requested caller's saved PC but this UnwindPlan uses a RA reg; getting rip (16) instead
intern-state      th1/fr1 supplying caller's saved rip (16)'s location using eh_frame CFI UnwindPlan
intern-state      th1/fr1 supplying caller's register rip (16) from the stack, saved at CFA plus offset -8 [saved at 0x7fffffffea48]
intern-state       th1/fr2 pc = 0x7ffff7febadf
intern-state      th1/fr1 supplying caller's saved rbp (6)'s location using eh_frame CFI UnwindPlan
intern-state      th1/fr1 supplying caller's register rbp (6) from the stack, saved at CFA plus offset -16 [saved at 0x7fffffffea40]
intern-state       th1/fr2 fp = 0x7ffff7fd44e0
intern-state      th1/fr1 supplying caller's saved rsp (7)'s location using ABI default
intern-state      th1/fr1 supplying caller's register rsp (7), value is CFA plus offset 0 [value is 0x7fffffffea50]
intern-state       th1/fr2 sp = 0x7fffffffea50
intern-state       th1/fr2 with pc value of 0x7ffff7febadf, symbol name is '___lldb_unnamed_symbol124$$ld-2.31.so'
intern-state       th1/fr2 active row: 0x00007ffff7feb5e1: CFA=rsp+144 => rbx=[CFA-56] rbp=[CFA-48] r12=[CFA-40] r13=[CFA-32] r14=[CFA-24] r15=[CFA-16] rip=[CFA-8]

intern-state      th1/fr1 supplying caller's saved rsp (7)'s location, cached
intern-state       th1/fr2 CFA is 0x7fffffffeae0: Register rsp (7) contents are 0x7fffffffea50, offset is 144
intern-state       th1/fr2 m_cfa = 0x7fffffffeae0 m_afa = 0xffffffffffffffff
intern-state       th1/fr2 initialized frame current pc is 0x7ffff7febadf cfa is 0x7fffffffeae0 afa is 0xffffffffffffffff
intern-state      th1/fr1 supplying caller's saved rip (16)'s location, cached
intern-state     (x86_64) /lib64/ld-linux-x86-64.so.2: Reading EH frame info
intern-state     th1/fr0 using architectural default unwind method
intern-state     th1/fr0 with pc value of 0x7ffff7fd5ec7, no symbol/function name is known.
intern-state     th1/fr0 0x00007ffff7fd5ec7: CFA=rbp+16 => rbp=[CFA-16] rsp=CFA+0 rip=[CFA-8

Apparently, it then remembers this decision for CPython frames as well, even though we definitely do have DWARF debugging symbols, and eh_frame-based stack unwinding should be used:

intern-state     th1/fr0 CFA is 0x7fffffffea50: Register rbp (6) contents are 0x7fffffffea40, offset is 16
intern-state     th1/fr0 initialized frame current pc is 0x7ffff7fd6b1a cfa is 0x7fffffffea50 afa is 0xffffffffffffffff using x86_64 default unwind plan UnwindPlan
intern-state     th1/fr0 supplying caller's saved rip (16)'s location using x86_64 default unwind plan UnwindPlan
intern-state     th1/fr0 supplying caller's register rip (16) from the stack, saved at CFA plus offset -8 [saved at 0x7fffffffea48]
intern-state      th1/fr1 pc = 0x7ffff7febadf
intern-state     th1/fr0 supplying caller's saved rbp (6)'s location using x86_64 default unwind plan UnwindPlan
intern-state     th1/fr0 supplying caller's register rbp (6) from the stack, saved at CFA plus offset -16 [saved at 0x7fffffffea40]
intern-state      th1/fr1 fp = 0x7ffff7fd44e0
intern-state     th1/fr0 supplying caller's saved rsp (7)'s location using x86_64 default unwind plan UnwindPlan
intern-state     th1/fr0 supplying caller's register rsp (7), value is CFA plus offset 0 [value is 0x7fffffffea50]
intern-state      th1/fr1 sp = 0x7fffffffea50
intern-state      th1/fr1 using architectural default unwind method
intern-state     th1/fr0 supplying caller's saved rbp (6)'s location, cached
intern-state      th1/fr1 CFA is 0x7ffff7fd44f0: Register rbp (6) contents are 0x7ffff7fd44e0, offset is 16
intern-state      th1/fr1 initialized frame cfa is 0x7ffff7fd44f0 afa is 0xffffffffffffffff
intern-state     th1/fr0 supplying caller's saved rip (16)'s location, cached
intern-state      th1/fr1 supplying caller's saved rip (16)'s location using x86_64 default unwind plan UnwindPlan
intern-state      th1/fr1 supplying caller's register rip (16) from the stack, saved at CFA plus offset -8 [saved at 0x7ffff7fd44e8]
intern-state       th1/fr2 pc = 0x4c56415741e58948
intern-state      th1/fr1 supplying caller's saved rbp (6)'s location using x86_64 default unwind plan UnwindPlan
intern-state      th1/fr1 supplying caller's register rbp (6) from the stack, saved at CFA plus offset -16 [saved at 0x7ffff7fd44e0]
intern-state       th1/fr2 fp = 0x9cd8058d4855
intern-state      th1/fr1 supplying caller's saved rsp (7)'s location using x86_64 default unwind plan UnwindPlan
intern-state      th1/fr1 supplying caller's register rsp (7), value is CFA plus offset 0 [value is 0x7ffff7fd44f0]
intern-state       th1/fr2 sp = 0x7ffff7fd44f0
intern-state       th1/fr2 using architectural default unwind method
intern-state       th1/fr2 pc is in a non-executable section of memory and this isn't the 2nd frame in the stack walk.
intern-state       Frame 2 invalid RegisterContext for this frame, stopping stack walk
intern-state     th1/fr0 with pc value of 0x7ffff7ddea49, symbol name is 'builtin_abs'
intern-state     th1/fr0 0x00007ffff7cbd88a: CFA=rbp+16 => rbp=[CFA-16] rsp=CFA+0 rip=[CFA-8]

So far I haven't managed to find what exactly causes this problem, but my understanding is that it's not specific to LLDB 12+, as LLDB 13 works just fine on my Arch Linux with CPython 3.10 built from the source code.

malor commented 2 years ago

I believe the critical bit is here where LLDB gets confused when loading DWARF debugging symbols from /lib64/ld-linux-x86-64.so.2; after that it no longer recognizes the symbol from the dynamic linker shared object and resorts to the default stack unwind method

I don't understand this part: in the official CPython Docker images /lib64/ld-linux-x86-64.so.2 is a symlink to /lib/x86_64-linux-gnu/ld-2.31.so

# ls -la /lib64/
total 8
drwxr-xr-x 2 root root 4096 Dec  1 00:00 .
drwxr-xr-x 1 root root 4096 Dec 20 07:21 ..
lrwxrwxrwx 1 root root   32 Oct  2 12:47 ld-linux-x86-64.so.2 -> /lib/x86_64-linux-gnu/ld-2.31.so

so LLDB should be able to load the DWARF debugging symbols from /lib64/ld-linux-x86-64.so.2, just like it did for /lib/x86_64-linux-gnu/ld-2.31.so, but somehow that results in the unknown symbol warning.

However, if I replace the /lib64/ld-linux-x86-64.so.2 symlink with a copy of /lib/x86_64-linux-gnu/ld-2.31.so, stack traces are still broken, even though the warnings are gone: https://xsnippet.org/QxWuMd8h

malor commented 3 months ago

... and the tests magically pass on LLDB versions 15 and 16 (#66), but not on 12-14 or 17-19 :( I'll take a closer look to see what is special about those two.