llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
27.61k stars 11.35k forks source link

Since QEMU v8.1.0 lldb loads qemu itself not the emulated binary #87463

Open DavidSpickett opened 4 months ago

DavidSpickett commented 4 months ago

Since https://gitlab.com/qemu-project/qemu/-/commit/dc14a7a6e95571122ec2428abb355fe2c43e05c6, qemu userspace emulation returns a valid PID which is that of qemu itself. This fools lldb into loading qemu not the emulated binary, meaning any breakpoints are placed in qemu itself, if the host and emulated architecture happen to match.

$ cat /tmp/test.c
int main() { return 0; }
$ ./bin/clang /tmp/test.c -o /tmp/test.o -g
$ ./bin/lldb /tmp/test.o
(lldb) target create "/tmp/test.o"
Current executable set to '/tmp/test.o' (aarch64).
(lldb) gdb-remote 8888
Process 1193628 stopped
* thread #1, stop reason = signal SIGTRAP
    frame #0: 0x0000ffff9ef74100
->  0xffff9ef74100: mov    x0, sp
    0xffff9ef74104: bl     0xffff9ef74b80
    0xffff9ef74108: mov    x21, x0
    0xffff9ef7410c: ldr    x1, [sp]
(lldb) image list
[  0] A4DFF317-8466-F3BA-678C-DB3572EA7B04-61442A1A 0x0000aaaaaaa240e0 /home/david.spickett/qemu/build/qemu-aarch64
(lldb) dis main
error: 'disassemble' doesn't take any arguments.
(lldb) dis -n main
qemu-aarch64`main:
<...>
    0xaaaaaaa9f850 <+80>:   bl     0x7de10        ; qemu_init_cpu_list at cpu-common.c:39:1

Previously qemu returned a PID of 1, we'd fail to find a binary for that, and use the one the user originally chose. You can see this happening by enabling some of the logs.

This is what used to happen:

lldb             DynamicLoaderDarwin::UseDYLDSPI: Use old DynamicLoader plugin
lldb             DynamicLoaderDarwin::UseDYLDSPI: Use old DynamicLoader plugin
lldb             DYLDRendezvous::UpdateExecutablePath exe module executable path set: '/tmp/test.o'
lldb             DynamicLoaderPOSIXDYLD::DidAttach() pid 1
lldb             <  26> send packet: $qXfer:auxv:read::0,fff#dc
lldb             <   1> read packet: +
lldb             < 325> read packet: $6c03000000000000004000aaaaaaaa0000040000000000000038000000000000000500000000000000090000000000000006000000000000000010000000000000070000000000000000505d8bffff0000080000000000000000000000000000000900000000000000f005aaaaaaaa00000b000000000000005a3b0000000000000c000000000000005a3b0000000000000d0000000000000010270000000000000e0000000000000010270000000000001000000000000000fbffffef000000001100000000000000640000000000000019000000000000004096e08bffff0000170000000000000000000000000000001f00000000000000ec9fe08bffff00001a00000000000000ff7fc77f001800000f000000000000005996e08bffff000021000000000000000040b28cffff000000000000000000000000000000000000#55
lldb             <   1> send packet: +
lldb             DynamicLoaderPOSIXDYLD::DidAttach pid 1 reloaded auxv data
lldb             DynamicLoaderPOSIXDYLD::ResolveExecutableModule - got executable by pid 1:
lldb             DYLDRendezvous::UpdateExecutablePath exe module executable path set: '/tmp/test.o'
lldb             DynamicLoaderPOSIXDYLD::DidAttach pid 1 executable '/tmp/test.o', load_offset 0xaaaaaaaa0000
lldb             Rendezvous structure is not set up yet. Trying to locate rendezvous breakpoint in the interpreter by symbol name.

This line is supposed to print a binary name if one was found but it was not:

lldb             DynamicLoaderPOSIXDYLD::ResolveExecutableModule - got executable by pid 1: <program file name would go here if we had one>

And after the QEMU change:

lldb             DynamicLoaderDarwin::UseDYLDSPI: Use old DynamicLoader plugin
lldb             DynamicLoaderDarwin::UseDYLDSPI: Use old DynamicLoader plugin
lldb             DYLDRendezvous::UpdateExecutablePath exe module executable path set: '/tmp/test.o'
lldb             DynamicLoaderPOSIXDYLD::DidAttach() pid 1189534
lldb             <  26> send packet: $qXfer:auxv:read::0,fff#dc
lldb             <   1> read packet: +
lldb             < 325> read packet: $6c03000000000000004000aaaaaaaa0000040000000000000038000000000000000500000000000000090000000000000006000000000000000010000000000000070000000000000000e05fbaffff0000080000000000000000000000000000000900000000000000f005aaaaaaaa00000b000000000000005a3b0000000000000c000000000000005a3b0000000000000d0000000000000010270000000000000e0000000000000010270000000000001000000000000000fbffffef000000001100000000000000640000000000000019000000000000004026e3baffff0000170000000000000000000000000000001f00000000000000ec2fe3baffff00001a00000000000000ff7fc77f001800000f000000000000005926e3baffff0000210000000000000000d0b4bbffff000000000000000000000000000000000000#1d
lldb             <   1> send packet: +
lldb             DynamicLoaderPOSIXDYLD::DidAttach pid 1189534 reloaded auxv data
lldb             DynamicLoaderPOSIXDYLD::ResolveExecutableModule - got executable by pid 1189534: /home/david.spickett/qemu/build/qemu-aarch64
lldb             DYLDRendezvous::UpdateExecutablePath exe module executable path set: '/home/david.spickett/qemu/build/qemu-aarch64'
lldb             DynamicLoaderPOSIXDYLD::DidAttach pid 1189534 executable '/home/david.spickett/qemu/build/qemu-aarch64', load_offset 0xaaaaaaa240e0
lldb             DynamicLoaderPOSIXDYLD::DidAttach pid 1189534 added executable '/home/david.spickett/qemu/build/qemu-aarch64' to module load list
lldb             <  16> send packet: $qfThreadInfo#bb
lldb             DynamicLoaderPOSIXDYLD::ResolveExecutableModule - got executable by pid 1189534: /home/david.spickett/qemu/build/qemu-aarch64

I am emulating AArch64 on an AArch64 host, so it's possible that if they're mismatched, lldb will reject the host binary and this issue won't happen.

I am guessing this from the fact that https://gitlab.com/qemu-project/qemu/-/commit/6c78de6eb6f986b2e06e95fabad62731a44aaafd fixed a follow up bug in QEMU when using lldb to debug Hexagon, but not this specific issue. So if they are doing x86 -> Hexagon debugging, the PID lookup may just be failing to find a compatible binary.

If you use the qemu-user platform this issue does not happen because it does not implement GetProcessInfo.

$ cat lldb-commands
settings set platform.plugin.qemu-user.emulator-path /home/david.spickett/qemu/build/qemu-aarch64
settings set platform.plugin.qemu-user.architecture aarch64
target create --platform qemu-user /tmp/test.o
b main
log enable lldb platform
run
$ ./bin/lldb -s lldb-commands

I think the fundamental issue is that if you connect to a gdb-remote using anything that looks like a localhost address, we assume it's the host platform. Likely because this is how normal debugging works, we start a lldb-server on the host on your behalf.

A lot of gdb-remotes are in fact embedded simulators e.g. msp430, console emulators, etc. which don't have anything to do with the host. I expect 99% of the time they are a different architecture, or if they do return a PID it's 1 or one that doesn't match the host.

Not sure what a fix here would be given that this localhost assumption makes other parts of lldb work properly. Perhaps there is some way to spell "localhost:port" in a way that lldb won't detect as being on the host.

This issue was discovered by a colleague who was trying to debug AArch64 SME code (emulated by qemu) on an AArch64 host (that lacks SME).

DavidSpickett commented 4 months ago

Friendly neighborhood GDB expert informs me that anything target remote in gdb is treated as a remote even if the port is on the host. So it would not have this issue.

DavidSpickett commented 4 months ago

Turns out Pavel made a specific change to prevent the qemu-user platform from having this problem: https://github.com/llvm/llvm-project/commit/1dc39378c46643ec9d2544da671aca78e7c6967a

(I tested an lldb 15 which does exhibit this problem when using qemu-user)

llvmbot commented 4 months ago

@llvm/issue-subscribers-lldb

Author: David Spickett (DavidSpickett)

Since https://gitlab.com/qemu-project/qemu/-/commit/dc14a7a6e95571122ec2428abb355fe2c43e05c6, qemu userspace emulation returns a valid PID which is that of qemu itself. This fools lldb into loading qemu not the emulated binary, meaning any breakpoints are placed in qemu itself. ``` $ cat /tmp/test.c int main() { return 0; } $ ./bin/clang /tmp/test.c -o /tmp/test.o -g $ ./bin/lldb /tmp/test.o (lldb) target create "/tmp/test.o" Current executable set to '/tmp/test.o' (aarch64). (lldb) gdb-remote 8888 Process 1193628 stopped * thread #1, stop reason = signal SIGTRAP frame #0: 0x0000ffff9ef74100 -> 0xffff9ef74100: mov x0, sp 0xffff9ef74104: bl 0xffff9ef74b80 0xffff9ef74108: mov x21, x0 0xffff9ef7410c: ldr x1, [sp] (lldb) image list [ 0] A4DFF317-8466-F3BA-678C-DB3572EA7B04-61442A1A 0x0000aaaaaaa240e0 /home/david.spickett/qemu/build/qemu-aarch64 (lldb) dis main error: 'disassemble' doesn't take any arguments. (lldb) dis -n main qemu-aarch64`main: <...> 0xaaaaaaa9f850 <+80>: bl 0x7de10 ; qemu_init_cpu_list at cpu-common.c:39:1 ``` Previously qemu returned a PID of 1, we'd fail to find a binary for that, and use the one the user originally chose. You can see this happening by enabling some of the logs. This is what used to happen: ``` lldb DynamicLoaderDarwin::UseDYLDSPI: Use old DynamicLoader plugin lldb DynamicLoaderDarwin::UseDYLDSPI: Use old DynamicLoader plugin lldb DYLDRendezvous::UpdateExecutablePath exe module executable path set: '/tmp/test.o' lldb DynamicLoaderPOSIXDYLD::DidAttach() pid 1 lldb < 26> send packet: $qXfer:auxv:read::0,fff#dc lldb < 1> read packet: + lldb < 325> read packet: $6c03000000000000004000aaaaaaaa0000040000000000000038000000000000000500000000000000090000000000000006000000000000000010000000000000070000000000000000505d8bffff0000080000000000000000000000000000000900000000000000f005aaaaaaaa00000b000000000000005a3b0000000000000c000000000000005a3b0000000000000d0000000000000010270000000000000e0000000000000010270000000000001000000000000000fbffffef000000001100000000000000640000000000000019000000000000004096e08bffff0000170000000000000000000000000000001f00000000000000ec9fe08bffff00001a00000000000000ff7fc77f001800000f000000000000005996e08bffff000021000000000000000040b28cffff000000000000000000000000000000000000#55 lldb < 1> send packet: + lldb DynamicLoaderPOSIXDYLD::DidAttach pid 1 reloaded auxv data lldb DynamicLoaderPOSIXDYLD::ResolveExecutableModule - got executable by pid 1: lldb DYLDRendezvous::UpdateExecutablePath exe module executable path set: '/tmp/test.o' lldb DynamicLoaderPOSIXDYLD::DidAttach pid 1 executable '/tmp/test.o', load_offset 0xaaaaaaaa0000 lldb Rendezvous structure is not set up yet. Trying to locate rendezvous breakpoint in the interpreter by symbol name. ``` This line is supposed to print a binary name if one was found but it was not: ``` lldb DynamicLoaderPOSIXDYLD::ResolveExecutableModule - got executable by pid 1: <program file name would go here if we had one> ``` And after the QEMU change: ``` lldb DynamicLoaderDarwin::UseDYLDSPI: Use old DynamicLoader plugin lldb DynamicLoaderDarwin::UseDYLDSPI: Use old DynamicLoader plugin lldb DYLDRendezvous::UpdateExecutablePath exe module executable path set: '/tmp/test.o' lldb DynamicLoaderPOSIXDYLD::DidAttach() pid 1189534 lldb < 26> send packet: $qXfer:auxv:read::0,fff#dc lldb < 1> read packet: + lldb < 325> read packet: $6c03000000000000004000aaaaaaaa0000040000000000000038000000000000000500000000000000090000000000000006000000000000000010000000000000070000000000000000e05fbaffff0000080000000000000000000000000000000900000000000000f005aaaaaaaa00000b000000000000005a3b0000000000000c000000000000005a3b0000000000000d0000000000000010270000000000000e0000000000000010270000000000001000000000000000fbffffef000000001100000000000000640000000000000019000000000000004026e3baffff0000170000000000000000000000000000001f00000000000000ec2fe3baffff00001a00000000000000ff7fc77f001800000f000000000000005926e3baffff0000210000000000000000d0b4bbffff000000000000000000000000000000000000#1d lldb < 1> send packet: + lldb DynamicLoaderPOSIXDYLD::DidAttach pid 1189534 reloaded auxv data lldb DynamicLoaderPOSIXDYLD::ResolveExecutableModule - got executable by pid 1189534: /home/david.spickett/qemu/build/qemu-aarch64 lldb DYLDRendezvous::UpdateExecutablePath exe module executable path set: '/home/david.spickett/qemu/build/qemu-aarch64' lldb DynamicLoaderPOSIXDYLD::DidAttach pid 1189534 executable '/home/david.spickett/qemu/build/qemu-aarch64', load_offset 0xaaaaaaa240e0 lldb DynamicLoaderPOSIXDYLD::DidAttach pid 1189534 added executable '/home/david.spickett/qemu/build/qemu-aarch64' to module load list lldb < 16> send packet: $qfThreadInfo#bb ``` ``` lldb DynamicLoaderPOSIXDYLD::ResolveExecutableModule - got executable by pid 1189534: /home/david.spickett/qemu/build/qemu-aarch64 ``` I am emulating AArch64 on an AArch64 host, so it's possible that if they're mismatched, lldb will reject the host binary and this issue won't happen. I am guessing this from the fact that https://gitlab.com/qemu-project/qemu/-/commit/6c78de6eb6f986b2e06e95fabad62731a44aaafd fixed a follow up bug in QEMU when using lldb to debug Hexagon, but not this specific issue. So if they are doing x86 -> Hexagon debugging, the PID lookup may just be failing to find a compatible binary. If you use the qemu-user platform this issue does not happen because it does not implement `GetProcessInfo`. ``` $ cat lldb-commands settings set platform.plugin.qemu-user.emulator-path /home/david.spickett/qemu/build/qemu-aarch64 settings set platform.plugin.qemu-user.architecture aarch64 target create --platform qemu-user /tmp/test.o b main log enable lldb platform run $ ./bin/lldb -s lldb-commands ``` I think the fundamental issue is that if you connect to a `gdb-remote` using anything that looks like a localhost address, we assume it's the host platform. Likely because this is how normal debugging works, we start a `lldb-server` on the host on your behalf. A lot of gdb-remotes are in fact embedded simulators e.g. msp430, console emulators, etc. which don't have anything to do with the host. I expect 99% of the time they are a different architecture, or if they do return a PID it's 1 or one that doesn't match the host. Not sure what a fix here would be given that this localhost assumption makes other parts of lldb work properly. Perhaps there is some way to spell "localhost:port" in a way that lldb won't detect as being on the host. This issue was discovered by a colleague who was trying to debug AArch64 SME code (emulated by qemu) on an AArch64 host (that lacks SME).
DavidSpickett commented 2 months ago

This also effects lldb 16.0 which does not include Pavel's change.