ros2 / ros2cli

ROS 2 command line interface tools
Apache License 2.0
173 stars 160 forks source link

ros2cli daemon check fails running on WSL2 #934

Open atyshka opened 2 weeks ago

atyshka commented 2 weeks ago

Bug report

Required Info:

Steps to reproduce issue

  1. Create a new Ubuntu 22.04 VM via WSL2
  2. Enable the "mirrored" network interface by adding the following to .wslconfig in your user directory on Windows:
    [wsl2]
    networkingMode=mirrored
  3. Reboot Windows, start up Ubuntu, and verify mirrored network is enabled with wslinfo --networking-mode from inside the Ubuntu VM
  4. Install ROS Humble
  5. Attempt a ros2 topic list or any other command involving the daemon.

Expected behavior

The CLI commands work without an error, spawning the daemon if it has not already started.

Actual behavior

The CLI hangs for roughly 2 minutes and then crashes, unless the daemon has been explicitly started. Here is the traceback:

Traceback (most recent call last):
  File "/opt/ros/humble/bin/ros2", line 33, in <module>
    sys.exit(load_entry_point('ros2cli==0.18.11', 'console_scripts', 'ros2')())
  File "/opt/ros/humble/lib/python3.10/site-packages/ros2cli/cli.py", line 91, in main
    rc = extension.main(parser=parser, args=args)
  File "/opt/ros/humble/lib/python3.10/site-packages/ros2topic/command/topic.py", line 41, in main
    return extension.main(args=args)
  File "/opt/ros/humble/lib/python3.10/site-packages/ros2topic/verb/list.py", line 55, in main
    with NodeStrategy(args) as node:
  File "/opt/ros/humble/lib/python3.10/site-packages/ros2cli/node/strategy.py", line 27, in __init__
    if use_daemon and is_daemon_running(args):
  File "/opt/ros/humble/lib/python3.10/site-packages/ros2cli/node/daemon.py", line 75, in is_daemon_running
    return node.connected
  File "/opt/ros/humble/lib/python3.10/site-packages/ros2cli/node/daemon.py", line 46, in connected
    for method in self._proxy.system.listMethods()
  File "/usr/lib/python3.10/xmlrpc/client.py", line 1122, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib/python3.10/xmlrpc/client.py", line 1464, in __request
    response = self.__transport.request(
  File "/usr/lib/python3.10/xmlrpc/client.py", line 1166, in request
    return self.single_request(host, handler, request_body, verbose)
  File "/usr/lib/python3.10/xmlrpc/client.py", line 1178, in single_request
    http_conn = self.send_request(host, handler, request_body, verbose)
  File "/usr/lib/python3.10/xmlrpc/client.py", line 1291, in send_request
    self.send_content(connection, request_body)
  File "/usr/lib/python3.10/xmlrpc/client.py", line 1321, in send_content
    connection.endheaders(request_body)
  File "/usr/lib/python3.10/http/client.py", line 1278, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.10/http/client.py", line 1038, in _send_output
    self.send(msg)
  File "/usr/lib/python3.10/http/client.py", line 976, in send
    self.connect()
  File "/usr/lib/python3.10/http/client.py", line 942, in connect
    self.sock = self._create_connection(
  File "/usr/lib/python3.10/socket.py", line 845, in create_connection
    raise err
  File "/usr/lib/python3.10/socket.py", line 833, in create_connection
    sock.connect(sa)
TimeoutError: [Errno 110] Connection timed out

It seems that instead of immediately failing when trying to connect to an existing daemon, it keeps trying until a timeout is reached and then the timeout is not handled gracefully.

It would be simple to add a connection timeout exception handler, but this would still result in a long wait before the timeout is triggered. We'd need to figure out why the connection attempt is timing out instead of getting refused.

Additional information

I am using the mirrored network option on WSL2 because the regular nat behavior does not allow multicast and thus cannot communicate with external ROS2 machines on the LAN. I should note that mirrored network seems to work for everything I've tested in ROS2 other than this CLI tool.

If I instead use the default NAT behavior of WSL2, ros2cli does not experience this error but then ROS2 does not function correctly over LAN (because NAT mode doesn't support multicast). It seems that somehow the request to port 11511 gets lost rather than rejected. From the windows side I have done some investigation in netstat to see if there is anything using that port but I can't find anything.

I am not sure if you care to maintain WSL support, but it seems like a good alternative to native Windows that works well in most cases. If there is a simple fix for this issue, WSL would be a great solution for my needs and I'd be happy to share my setup with the community.

atyshka commented 2 weeks ago

Update: root cause seems to be tracked in microsoft/WSL#10855, definitely not a problem we can fix ourselves. I would prefer to keep this issue open and close it when the relevant fix is implemented on the Microsoft side, but you can close it if you want.