Open DavidSpickett opened 2 months ago
@llvm/issue-subscribers-lldb
Author: David Spickett (DavidSpickett)
I have investigated this issue little bit.
The variable port
is initialized with 0
and passed to LaunchGDBServer()
https://github.com/llvm/llvm-project/blob/0870afaaaccde5b4bae37abfc982207ffafb8332/lldb/tools/lldb-server/lldb-platform.cpp#L355-L359
It seems the condition if (!port)
is never met because std::optional<uint16_t> port
has a value. It is a bug and must be fixed!
https://github.com/llvm/llvm-project/blob/0870afaaaccde5b4bae37abfc982207ffafb8332/lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerPlatform.cpp#L160-L169
Then port = 0
is always passed to StartDebugserverProcess()
called from GDBRemoteCommunicationServerPlatform::LaunchGDBServer().
But StartDebugserverProcess() does not use the passed port value anyway and updates the port with a new value (child_port).
https://github.com/llvm/llvm-project/blob/76e37b1a08906620537440ebcd5162697079cba5/lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunication.cpp#L1159-L1171
Note m_port_map.AssociatePortWithProcess(*port, pid)
will silently fail here because a new port value is missing in portmap_for_child
.
https://github.com/llvm/llvm-project/blob/7b135f7c0881ef0718c5c83e4d8556c5fdb32d86/lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerPlatform.cpp#L209-L221
gdbserver_portmap
is useless because it does not reflect the actual port used.
StartDebugserverProcess() will try to listen on 127.0.0.1:0 only if url
is nullptr, but url
is always defined if the protocol is tcp. Otherwise pipes are used (binding to port zero).
Here is the log of lldb-server processes (parent and child) on remote Linux
/home/ubuntu/lldb-server p --log-channels lldb all --listen *:1234 --server --min-gdbserver-port 1236 --max-gdbserver-port 1240
/home/ubuntu/lldb-server gdbserver tcp://[10.1.1.170]:0 --native-regs --pipe 6
Note the port 0
in the url tcp://[10.1.1.170]:0
is a bug now. but any port in this url will be ignored.
I don't see where --min-gdbserver-port
, --max-gdbserver-port
and --gdbserver-port
values are really used. Do we still need them?
--port-offset
is not used too.
Probably it is better to revert #88845 since the port mapping does not work as expected anyway. But #88845 caused test failures on cross builds.
I don't see where --min-gdbserver-port, --max-gdbserver-port and --gdbserver-port values are really used. Do we still need them? --port-offset is not used too.
Start from lldb/tools/lldb-server/lldb-platform.cpp
instead. They should be doing something if the user provided them. They are supposed to be for situations where you don't have all the ports available between the host and VM.
They've always been a bit dodgy but they are useful for working on simulators.
Currently the port in url
is always 0 because std::optional<uint16_t> port
is initialized here and LaunchGDBServer() does not request port map at all
https://github.com/llvm/llvm-project/blob/0870afaaaccde5b4bae37abfc982207ffafb8332/lldb/tools/lldb-server/lldb-platform.cpp#L355
Note usually it is better to configure simulators (qemu) to use a network bridge with own IP w/o any port limits and do not map ports manually to host's ports.
Note I'm working on a patch to fix this issue and all port mapping related issues. The idea is to use threads for platform mode and use a common port map for all connections instead of making a new portmap_for_child
with 1 port.
To reproduce, the remote needs to be something relatively slow. In my case QEMU.
Start an lldb-server on the remote in platform mode with some restricted ports:
Then connect lldb and run a program:
At this point lldb-server has started a new process to handle this client's connection and given it a port map that has one open port, that it has used to start a gdb-server process.
From here, if you finish the program then run it again, everything is fine. The gdbserver is torn down and the port is freed, then reused for the new gdbserver.
However, if you run before the program finishes there is a race between the platform killing the gdbserver process and the platform handling the launch gdb server request packet:
We can see that the packets are sent in the right order:
On the server side it uses
kill()
to kill the process and it should then free the port here: https://github.com/llvm/llvm-project/blob/76c84e702bd9af7db2bb9373ba6de0508f1e57a9/lldb/tools/lldb-server/lldb-platform.cpp#L300(introduced by https://github.com/llvm/llvm-project/pull/88845, which I think made the feature overall better, but in doing so exposed this issue)
If the remote is slow enough that the kill doesn't actually finish before we get back there, then it won't and it'll try to find a free port, find none, and fail to launch the gdb server. Some ad-hoc logging shows this on the server side:
This does not happen debugging locally on real hardware.
I discovered this running some of the SVE tests again. They run the debugee once to discover the supported vector lengths and then again to run the actual test case. Adding a
sleep(5)
in the test cases between those 2 runs also "fixes" the issue.The workaround is to not use a port map at all, but often giving full network access to a VM is difficult. So I'd like to find a way to make this work, or at least work around it in the tests most likely to be run this way.