llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.02k stars 11.57k forks source link

LLDB remote debugging fails with invalid host:port specification #61955

Closed wAuner closed 1 year ago

wAuner commented 1 year ago

I want to use lldb for remote debugging between two macs (13.3). I followed the lldb instructions and tried it with Xcode's debugserver as well as lldb-server and lldb 16 from brew. I want to upload and debug a locally built C++ executable called utmpx. Tested it between Apple Silicon & VM, and Apple Silicon & Intel, always the same error.

On the remote:

On the local machine:

(lldb) platform select remote-macosx
  Platform: remote-macosx
 Connected: no
  SDK Path: "/Users/picard/Library/Developer/Xcode/macOS DeviceSupport/13.3 (22E252) x86_64"
 SDK Roots: [ 0] "/Users/picard/Library/Developer/Xcode/macOS DeviceSupport/13.2 (22D49) x86_64"
 SDK Roots: [ 1] "/Users/picard/Library/Developer/Xcode/macOS DeviceSupport/13.3 (22E252) x86_64"
 SDK Roots: [ 2] "/Users/picard/Library/Developer/Xcode/macOS DeviceSupport/12.5.1 (21G83) x86_64"
 SDK Roots: [ 3] "/Users/picard/Library/Developer/Xcode/macOS DeviceSupport/10.16"
 SDK Roots: [ 4] "/Users/picard/Library/Developer/Xcode/macOS DeviceSupport/12.6 (21G115) x86_64"
 SDK Roots: [ 5] "/Users/picard/Library/Developer/Xcode/macOS DeviceSupport/13.0 (22A380) x86_64"
 SDK Roots: [ 6] "/Users/picard/Library/Developer/Xcode/macOS DeviceSupport/13.1 (22C65) x86_64"
 SDK Roots: [ 7] "/Users/picard/Library/Developer/Xcode/macOS DeviceSupport/12.4 (21F79) x86_64"
 SDK Roots: [ 8] "/Users/picard/Library/Developer/Xcode/macOS DeviceSupport/13.2.1 (22D68) x86_64"
 SDK Roots: [ 9] "/Users/picard/Library/Developer/Xcode/macOS DeviceSupport/12.5 (21G72) x86_64"
 SDK Roots: [10] "/Users/picard/Library/Developer/Xcode/macOS DeviceSupport/13.0.1 (22A400) x86_64"
(lldb) platform connect connect://192.168.64.2:1234
  Platform: remote-macosx
    Triple: arm64-apple-macosx
OS Version: 13.3 (22E252)
  Hostname: 127.0.0.1
 Connected: yes
WorkingDir: /Users/venturo
    Kernel: Darwin Kernel Version 22.4.0: Mon Mar  6 20:55:35 PST 2023; root:xnu-8796.101.5~3/RELEASE_ARM64_VMAPPLE
  SDK Path: "/Users/picard/Library/Developer/Xcode/macOS DeviceSupport/13.3 (22E252) x86_64"
 SDK Roots: [ 0] "/Users/picard/Library/Developer/Xcode/macOS DeviceSupport/13.2 (22D49) x86_64"
 SDK Roots: [ 1] "/Users/picard/Library/Developer/Xcode/macOS DeviceSupport/13.3 (22E252) x86_64"
 SDK Roots: [ 2] "/Users/picard/Library/Developer/Xcode/macOS DeviceSupport/12.5.1 (21G83) x86_64"
 SDK Roots: [ 3] "/Users/picard/Library/Developer/Xcode/macOS DeviceSupport/10.16"
 SDK Roots: [ 4] "/Users/picard/Library/Developer/Xcode/macOS DeviceSupport/12.6 (21G115) x86_64"
 SDK Roots: [ 5] "/Users/picard/Library/Developer/Xcode/macOS DeviceSupport/13.0 (22A380) x86_64"
 SDK Roots: [ 6] "/Users/picard/Library/Developer/Xcode/macOS DeviceSupport/13.1 (22C65) x86_64"
 SDK Roots: [ 7] "/Users/picard/Library/Developer/Xcode/macOS DeviceSupport/12.4 (21F79) x86_64"
 SDK Roots: [ 8] "/Users/picard/Library/Developer/Xcode/macOS DeviceSupport/13.2.1 (22D68) x86_64"
 SDK Roots: [ 9] "/Users/picard/Library/Developer/Xcode/macOS DeviceSupport/12.5 (21G72) x86_64"
 SDK Roots: [10] "/Users/picard/Library/Developer/Xcode/macOS DeviceSupport/13.0.1 (22A400) x86_64"
(lldb) file utmpx
Current executable set to '/Users/picard/Developer/C++ Snippets/build/utmpx' (arm64).
(lldb) run
error: invalid host:port specification: '[192.168.64.2]'

After the platform connect command the connection is established according to the server's output.

What could cause the error? Am I doing something wrong or is this a bug? I'm also a bit confused why the logged SDK are all referring to x86_64, when I'm on ARM.

llvmbot commented 1 year ago

@llvm/issue-subscribers-lldb

unvariant commented 1 year ago

I think the bug is here port is initialized to zero but never set again to the proper port, and later in MakeUrl the port is not set because it is zero, resulting in the weird [<hostname>] string.

DavidSpickett commented 1 year ago

At least in one path I looked at, port is set based on a value read from a qLaunchGDBServer packet received from the lldb platform.

Can you do log enable gdb-remote packets first, then try the commands again? I wonder if there's something not right with the response.

Either way, lldb certainly should be checking that port was set by someone.

DavidSpickett commented 1 year ago

Logs from @mokhaled2992 show the following:

<  39> send packet: $qLaunchGDBServer;host:C02DW6SAMD6T;#a3
Read(buffer, sizeof(buffer), timeout = 10000000 us, status = success, error = ) => bytes_read = 7
GDBRemoteCommunication::CheckForPacket adding 7 bytes: $E09#ae
<   7> read packet: $E09#ae
ProcessGDBRemote::ConnectToDebugserver Connecting to connect://[127.0.0.1]

Receiving E09 means the platform failed to launch the gdb server for some reason. Then lldb is continuing as if it succeeded, but there's no port to use of course.

DavidSpickett commented 1 year ago

This looks wrong to me: https://github.com/llvm/llvm-project/blob/538d3552900cfb772b05afcd24b13cff1236f43f/lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationClient.cpp#L2591

If you look at other uses, they check whether the successfully received response is in fact, an error response: https://github.com/llvm/llvm-project/blob/538d3552900cfb772b05afcd24b13cff1236f43f/lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationClient.cpp#L1887

This is probably why lldb tries to go ahead instead of telling you there was an error.

unvariant commented 1 year ago

I fixed the problem by building debugserver, it was missing and thats what was causing lldb-server to return the error response.

DavidSpickett commented 1 year ago

We should really have a better way to communicate that. However...

I followed the [lldb instructions](https://lldb.llvm.org/use/remote.html) and tried it with Xcode's debugserver as well as lldb-server and lldb 16 from brew.

Does that mean something else prevented you from using the XCode provided debugserver? I wonder if we need to note whatever it is if so.

I will fix the packet handling at least. The error reporting can be improved from there.

mokhaled2992 commented 1 year ago

@DavidSpickett Would you please elaborate what did you do to fix the issue? I listed the steps I followed in https://github.com/llvm/llvm-project/issues/63405#issue-1765320527. What did I miss?

FYI: I'm using the same steps on my linux machines and works without any issues.

DavidSpickett commented 1 year ago

I haven't fixed anything yet. If I understood @unvariant 's comment, they needed to explicitly build debugserver so that it was present on the remote machine. I guess ninja debugserver .

The thing I am looking to fix is lldb dropping the error silently, which is only half the problem.

I am not very familiar with the Mac setup and don't have a work machine to try it on so...

@JDevlieghere Can you clarify what one needs to do on MacOS? Does lldb-server need to find a locally built debugserver or can it use one you ship with XCode. I see there is a build time option for the latter. https://lldb.llvm.org/resources/build.html#id2

DavidSpickett commented 1 year ago

The thing I am looking to fix is lldb dropping the error silently, which is only half the problem.

https://reviews.llvm.org/D153513

Maddimax commented 1 year ago

I'm having the same issue, but: If I start:

/usr/bin/lldb-server platform --listen '*:10000' --server

then I get Launch: invalid host:port specification: '[192.168.64.5]' but if I start:

lldb-server platform --listen '*:10000' --server

then everything works and I can debug without any problems.

$ which lldb-server
/usr/bin/lldb-server

Any idea what might be going on?

In lldb I run: command script import test.py test.py

DavidSpickett commented 1 year ago

The result is the same, the cause will be different.

I wasn't able to reproduce your issue remotely or locally unfortunately. I can only guess something to do with the paths and the binary name used when we launch the gdbserver process.

If you are able to rebuild you could instrument GDBRemoteCommunication::StartDebugserverProcess (https://github.com/llvm/llvm-project/blob/e339b07944799ebd1692e8f7019690fe14a33257/lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunication.cpp#L884).

If not perhaps you can run the platform under strace (https://man7.org/linux/man-pages/man1/strace.1.html) to see if it even attempts to start the sub process.

DavidSpickett commented 1 year ago

Also you can run the platform with the process log channel enabled.

$ /mnt/virt_root/build-cross/bin/lldb-server platform --server --listen 0.0.0.0:10000 --log-channels "gdb-remote Process"                                   
Connection established.
GDBRemoteCommunication::StartDebugserverProcess(url=tcp://[192.168.53.1]:0, port=0)
GDBRemoteCommunication::StartDebugserverProcess() found gdb-remote stub exe '/mnt/virt_root/build-cross/bin/lldb-server'
launch info for gdb-remote stub:
Executable: lldb-server
Triple: *-*-*
Arguments:
argv[0]="/mnt/virt_root/build-cross/bin/lldb-server"
argv[1]="gdbserver"
argv[2]="tcp://[192.168.53.1]:0"
argv[3]="--native-regs"
argv[4]="--pipe"
argv[5]="5"
argv[6]=NULL
Maddimax commented 1 year ago

Should have mentioned: I'm running lldb on macOS and lldb-server on linux

Maddimax commented 1 year ago
$ /usr/bin/lldb-server platform --listen '*:10000' --server --log-channels "gdb-remote Process"
Connection established.
GDBRemoteCommunication::StartDebugserverProcess(url=tcp://[192.168.64.1]:0, port=0)
GDBRemoteCommunication::StartDebugserverProcess() could not find gdb-remote stub exe ''
GDBRemoteCommunication::StartDebugserverProcess() failed: unable to locate lldb-server-15.0.7

vs:

$ lldb-server platform --listen '*:10000' --server --log-channels "gdb-remote Process"
Connection established.
GDBRemoteCommunication::StartDebugserverProcess(url=tcp://[192.168.64.1]:0, port=0)
GDBRemoteCommunication::StartDebugserverProcess() found gdb-remote stub exe '/usr/lib/llvm-15/bin/lldb-server-15.0.7'
launch info for gdb-remote stub:
Executable: lldb-server-15.0.7
Triple: *-*-*
Arguments:
argv[0]="/usr/lib/llvm-15/bin/lldb-server-15.0.7"
argv[1]="gdbserver"
argv[2]="tcp://[192.168.64.1]:0"
argv[3]="--native-regs"
argv[4]="--pipe"
argv[5]="6"
argv[6]=NULL
DavidSpickett commented 1 year ago

What might be happening is that lldb-server tries to append to the "support exe dir". What does "--log-channels "gdb-remote Process: lldb host" show you? (I would have said before but only just found it myself)

Something like this is expected:

$ lldb-server platform --server --listen 0.0.0.0:10000 --log-channels "gdb-remote Process: lldb host"
Connection established.
<...>
GDBRemoteCommunication::StartDebugserverProcess(url=tcp://[192.168.53.1]:0, port=0)
shlib dir -> `(empty)`
support exe dir -> `/mnt/virt_root/build-cross/bin/`
GDBRemoteCommunication::StartDebugserverProcess() found gdb-remote stub exe '/mnt/virt_root/build-cross/bin/lldb-server'

If it detects the support exe dir as an empty path, appending lldb-server-15.0.7 to that would work, as that's probably on PATH as well. If it appends that directly to /usr/bin/ it won't be able to run /usr/bin/lldb-server-15.0.7.

Is /usr/bin/lldb-server a symlink by any chance? We might need to resolve that before we proceed in this function.

Maddimax commented 1 year ago

In the error case I see:

...
distribution id command returned "Distributor ID:   Ubuntu
"
distribution id set to "ubuntu"
GDBRemoteCommunication::StartDebugserverProcess(url=tcp://[192.168.64.1]:0, port=0)
shlib dir -> `/usr/bin/`
HostInfo::ComputePathRelativeToLibrary() attempting to derive the path /bin relative to liblldb install path: /usr/bin
HostInfo::ComputePathRelativeToLibrary() derived the path as: /usr/bin
support exe dir -> `/usr/bin/`
GDBRemoteCommunication::StartDebugserverProcess() could not find gdb-remote stub exe ''
GDBRemoteCommunication::StartDebugserverProcess() failed: unable to locate lldb-server-15.0.7
error: lost connection
lldb-server exiting...

I've now changed my code to first resolve all symlinks and call lldb-server that way.

That way I get:

distribution id set to "ubuntu"
GDBRemoteCommunication::StartDebugserverProcess(url=tcp://[192.168.64.1]:0, port=0)
shlib dir -> `/usr/lib/llvm-15/bin/`
HostInfo::ComputePathRelativeToLibrary() attempting to derive the path /bin relative to liblldb install path: /usr/lib/llvm-15/bin
HostInfo::ComputePathRelativeToLibrary() derived the path as: /usr/lib/llvm-15/bin
support exe dir -> `/usr/lib/llvm-15/bin/`
GDBRemoteCommunication::StartDebugserverProcess() found gdb-remote stub exe '/usr/lib/llvm-15/bin/lldb-server-15.0.7'
launch info for gdb-remote stub:
Executable: lldb-server-15.0.7
Triple: *-*-*
Arguments:
argv[0]="/usr/lib/llvm-15/bin/lldb-server-15.0.7"
argv[1]="gdbserver"
argv[2]="tcp://[192.168.64.1]:0"
argv[3]="--native-regs"
argv[4]="--pipe"
argv[5]="6"
argv[6]=NULL

Looks to me like the two things that should be improved is an error message when lldb-server can't be found, and resolving the symlink when looking for it.

Should I create a new issue, or is this one fine?

DavidSpickett commented 1 year ago

@Maddimax I've opened https://github.com/llvm/llvm-project/issues/63466 for explaining the error better, please open a new issue for the symlink handling. Include the logs you've already provided.

@wAuner are you still seeing the issue? Perhaps you can try adding --log-channels "gdb-remote Process: lldb host". It could be a missing debugserver or another path issue.

@mokhaled2992 please try the same logging and see if that helps you figure it out.

@unvariant I think has solved their issue by building debugserver.

I'll keep this open for a bit while we make sure everyone can at least get connected.

Maddimax commented 1 year ago

Done, and thanks for your help!

DavidSpickett commented 1 year ago

https://github.com/llvm/llvm-project/commit/dfbe3a79e20f1bc51a59ee858fabce792d59c9ae means you will now get a correct but unhelpful error message:

unable to launch a GDB server on <host>

https://github.com/llvm/llvm-project/issues/63466 covers making that explain itself.

If there's no more feedback I'll close this next week, all the work is covered by other issues.

DavidSpickett commented 1 year ago

The misleading error in this scenario has been fixed, other improvements raised as their own issues. Closing.