ros2 / ros2cli

ROS 2 command line interface tools
Apache License 2.0
174 stars 160 forks source link

ROS2 Daemon fails to respond #702

Open civerachb-cpr opened 2 years ago

civerachb-cpr commented 2 years ago

Bug report

Summary

Running ros2 topic list once results in all subsequent ros2 * commands to fail with the following error:

Traceback (most recent call last):
  File "/opt/ros/foxy/bin/ros2", line 11, in <module>
    load_entry_point('ros2cli==0.9.11', 'console_scripts', 'ros2')()
  File "/opt/ros/foxy/lib/python3.8/site-packages/ros2cli/cli.py", line 67, in main
    rc = extension.main(parser=parser, args=args)
  File "/opt/ros/foxy/lib/python3.8/site-packages/ros2topic/command/topic.py", line 41, in main
    return extension.main(args=args)
  File "/opt/ros/foxy/lib/python3.8/site-packages/ros2topic/verb/list.py", line 38, in main
    with NodeStrategy(args) as node:
  File "/opt/ros/foxy/lib/python3.8/site-packages/ros2cli/node/strategy.py", line 52, in __enter__
    self._daemon_node.__enter__()
  File "/opt/ros/foxy/lib/python3.8/site-packages/ros2cli/node/daemon.py", line 116, in __enter__
    methods = self._proxy.system.listMethods()
  File "/usr/lib/python3.8/xmlrpc/client.py", line 1109, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib/python3.8/xmlrpc/client.py", line 1450, in __request
    response = self.__transport.request(
  File "/usr/lib/python3.8/xmlrpc/client.py", line 1153, in request
    return self.single_request(host, handler, request_body, verbose)
  File "/usr/lib/python3.8/xmlrpc/client.py", line 1166, in single_request
    resp = http_conn.getresponse()
  File "/usr/lib/python3.8/http/client.py", line 1348, in getresponse
    response.begin()
  File "/usr/lib/python3.8/http/client.py", line 316, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python3.8/http/client.py", line 285, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response

Running ros2 daemon stop fails with the same error; the only way to stop the daemon is to run killall _ros2_daemon.

I am able to run ros2 topic echo .... and ros2 topic pub ... successfully, as long as I run them before running ros2 topic list. As soon as I run ros2 topic list once, all subsequent commands fail with the error above.

Required Info:

Steps to reproduce issue

I have 2 computers installed in my robot, both running Ubuntu 20.04. One publishes a ROS2 API that I need to expose to ROS Noetic on the other. For nomenclature purposes:

The main PC has the ros1_bridge package installed on it, though at this point I have not even been able to use it.

Both PCs are configured to use CycloneDDS and have the following envars set:

export RMW_IMPLEMENTATION=rmw_cyclonedds_cpp
export CYCLONEDDS_URI=file:///path/to/cyclone_dds.xml

cyclone_dds.xml contents:

<?xml version="1.0" encoding="UTF-8" ?>
<CycloneDDS xmlns="https://cdds.io/config" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="https://cdds.io/config https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/master/etc/cyclonedds.xsd">
  <Domain id="any">
    <General>
      <NetworkInterfaceAddress>eno1</NetworkInterfaceAddress>
      <AllowMulticast>true</AllowMulticast>
    </General>
    <Internal>
      <LeaseDuration>5 min</LeaseDuration>
    </Internal>
  </Domain>
</CycloneDDS>

Both computers have eno1 statically configured to unique IP addresses on the same subnet. I've tried 2 different subnets in case there was a conflict with something else on my network, but there was no effect.

Expected behavior

Running ros2 topic list multiple times should work, running ros2 topic echo ... after running ros2 topic list should echo the specified topic, ros2 daemon stop should be able to stop the daemon.

Actual behavior

As soon as I run ros2 topic list once all subsequent commands fail with http.client.RemoteDisconnected: Remote end closed connection without response. Killing the ROS2 daemon with killall _ros2_daemon temporarily fixes the problem, until I run ros2 topic list again.

If you need any additional information, or want me to run additional tests on one/both computers please let me know and I'll provide whatever information I can.

Thanks a lot for the help! I'm hopeful that this is just something simple I've misconfigured and there's an easy fix.

fujitatomoya commented 2 years ago

I am not sure what went wrong, but i cannot reproduce the problem on my local environment.

the following commands can be running more than 30 mins w/o any error with https://github.com/ros2/ros2/commit/9597c36591c643a1ff6d87964ddd3d999f191baf. (both rmw_cyclonedds_cpp and rmw_fastrtps_cpp)

# (terminal-1) restart ros2 daemon and ros2 topic list loop
ros2 daemon stop
ros2 daemon start
while true; do ros2 topic list; done

# (terminal-2) ros2 topic pub
ros2 topic pub /test std_msgs/msg/String data:\ \'hogehogehoge\'

# (terminal-3) ros2 topic echo
ros2 topic echo /test
civerachb-cpr commented 2 years ago

I'll try wiping the whole machine and starting again. I'm just really confused because on this specific machine the problem is always reproducible. But I don't understand enough about the problem to know where I should even start looking on this robot to figure out what the cause is.

fujitatomoya commented 2 years ago

how about starting ros2 daemon in debug mode?

root@tomoyafujita:~/ros2_ws/colcon_ws# ros2 daemon start --debug
The daemon has been started
root@tomoyafujita:~/ros2_ws/colcon_ws# Interface kind: 2, info: [('43.135.133.3', 'enp4s0', True)]
Interface kind: 10, info: [('fe80::5:73ff:fea0:6', 'enp4s0', True)]
Addresses by interfaces: {2: {'enp4s0': '43.135.133.118'}, 10: {'enp4s0': '2607:fd28:130:e09:0:dddd:1:1e7'}}
Serving XML-RPC on http://127.0.0.1:11511/ros2cli/
get_topic_names_and_types()
Interface kind: 2, info: [('43.135.133.3', 'enp4s0', True)]
Interface kind: 10, info: [('fe80::5:73ff:fea0:6', 'enp4s0', True)]
Addresses by interfaces: {2: {'enp4s0': '43.135.133.118'}, 10: {'enp4s0': '2607:fd28:130:e09:0:dddd:1:1e7'}}
get_topic_names_and_types()
Interface kind: 2, info: [('43.135.133.3', 'enp4s0', True)]
Interface kind: 10, info: [('fe80::5:73ff:fea0:6', 'enp4s0', True)]
Addresses by interfaces: {2: {'enp4s0': '43.135.133.118'}, 10: {'enp4s0': '2607:fd28:130:e09:0:dddd:1:1e7'}}
get_publishers_info_by_topic('/test')
Interface kind: 2, info: [('43.135.133.3', 'enp4s0', True)]
Interface kind: 10, info: [('fe80::5:73ff:fea0:6', 'enp4s0', True)]
Addresses by interfaces: {2: {'enp4s0': '43.135.133.118'}, 10: {'enp4s0': '2607:fd28:130:e09:0:dddd:1:1e7'}}
get_topic_names_and_types()
Interface kind: 2, info: [('43.135.133.3', 'enp4s0', True)]
Interface kind: 10, info: [('fe80::5:73ff:fea0:6', 'enp4s0', True)]
Addresses by interfaces: {2: {'enp4s0': '43.135.133.118'}, 10: {'enp4s0': '2607:fd28:130:e09:0:dddd:1:1e7'}}
get_name()
Interface kind: 2, info: [('43.135.133.3', 'enp4s0', True)]
Interface kind: 10, info: [('fe80::5:73ff:fea0:6', 'enp4s0', True)]
Addresses by interfaces: {2: {'enp4s0': '43.135.133.118'}, 10: {'enp4s0': '2607:fd28:130:e09:0:dddd:1:1e7'}}
get_namespace()
Interface kind: 2, info: [('43.135.133.3', 'enp4s0', True)]
Interface kind: 10, info: [('fe80::5:73ff:fea0:6', 'enp4s0', True)]
Addresses by interfaces: {2: {'enp4s0': '43.135.133.118'}, 10: {'enp4s0': '2607:fd28:130:e09:0:dddd:1:1e7'}}

if we issue ros2 topic list with another terminal, XML-RPC should receives the request and call get_topic_names_and_types method via XML-RPC.

civerachb-cpr commented 2 years ago

Terminal 1:

$ ros2 daemon start --debug
Interface kind: 2, info: [('10.27.0.1', 'wlp2s0', True)]
Addresses by interfaces: {2: {'wlp2s0': '10.27.15.57'}}
Serving XML-RPC on localhost:11621/ros2cli/
The daemon has been started

Terminal 2:

$ ros2 topic list
Traceback (most recent call last):
  File "/opt/ros/foxy/bin/ros2", line 11, in <module>
    load_entry_point('ros2cli==0.9.11', 'console_scripts', 'ros2')()
  File "/opt/ros/foxy/lib/python3.8/site-packages/ros2cli/cli.py", line 67, in main
    rc = extension.main(parser=parser, args=args)
  File "/opt/ros/foxy/lib/python3.8/site-packages/ros2topic/command/topic.py", line 41, in main
    return extension.main(args=args)
  File "/opt/ros/foxy/lib/python3.8/site-packages/ros2topic/verb/list.py", line 38, in main
    with NodeStrategy(args) as node:
  File "/opt/ros/foxy/lib/python3.8/site-packages/ros2cli/node/strategy.py", line 52, in __enter__
    self._daemon_node.__enter__()
  File "/opt/ros/foxy/lib/python3.8/site-packages/ros2cli/node/daemon.py", line 116, in __enter__
    methods = self._proxy.system.listMethods()
  File "/usr/lib/python3.8/xmlrpc/client.py", line 1109, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib/python3.8/xmlrpc/client.py", line 1450, in __request
    response = self.__transport.request(
  File "/usr/lib/python3.8/xmlrpc/client.py", line 1153, in request
    return self.single_request(host, handler, request_body, verbose)
  File "/usr/lib/python3.8/xmlrpc/client.py", line 1166, in single_request
    resp = http_conn.getresponse()
  File "/usr/lib/python3.8/http/client.py", line 1348, in getresponse
    response.begin()
  File "/usr/lib/python3.8/http/client.py", line 316, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python3.8/http/client.py", line 285, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response

I'm not sure why I only seem to get information for the wireless cart; the PC has multiple ethernet ports, but none are showing up in the daemon's output.

$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: enp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master br0 state UP group default qlen 1000
    link/ether c4:00:ad:44:1b:55 brd ff:ff:ff:ff:ff:ff
    inet 169.254.118.11/16 brd 169.254.255.255 scope global noprefixroute enp3s0
       valid_lft forever preferred_lft forever
3: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether c4:00:ad:44:1b:54 brd ff:ff:ff:ff:ff:ff
    inet 10.252.252.100/16 brd 10.252.255.255 scope global eno1
       valid_lft forever preferred_lft forever
    inet 169.254.193.253/16 brd 169.254.255.255 scope global noprefixroute eno1
       valid_lft forever preferred_lft forever
4: wlp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether b8:b7:f1:68:9b:eb brd ff:ff:ff:ff:ff:ff
    inet 10.27.15.57/16 brd 10.27.255.255 scope global dynamic wlp2s0
       valid_lft 82706sec preferred_lft 82706sec
    inet 10.27.15.61/16 brd 10.27.255.255 scope global secondary noprefixroute wlp2s0
       valid_lft forever preferred_lft forever
5: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether c4:00:ad:44:1b:55 brd ff:ff:ff:ff:ff:ff
    inet 192.168.131.1/24 brd 192.168.131.255 scope global br0
       valid_lft forever preferred_lft forever
    inet 169.254.201.239/16 brd 169.254.255.255 scope global noprefixroute br0
       valid_lft forever preferred_lft forever

Cyclone DDS configuration:

<?xml version="1.0" encoding="UTF-8" ?>
<CycloneDDS xmlns="https://cdds.io/config" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="https://cdds.io/config https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/master/etc/cyclonedds.xsd">
  <Domain id="any">
    <General>
      <NetworkInterfaceAddress>eno1</NetworkInterfaceAddress>
      <AllowMulticast>true</AllowMulticast>
    </General>
    <Internal>
      <LeaseDuration>5 min</LeaseDuration>
    </Internal>
  </Domain>
</CycloneDDS>
civerachb-cpr commented 2 years ago

Been looking into this further and I'm still no closer to finding an answer.

The ros2 daemon appears to start correctly and is listening on port 11511:

$ ros2 daemon start --debug
Interface kind: 2, info: [('10.27.0.1', 'wlp2s0', True)]
Addresses by interfaces: {2: {'wlp2s0': '10.27.15.27'}}
Serving XML-RPC on localhost:11511/ros2cli/
The daemon has been started

$ nmap localhost -p 11511
Starting Nmap 7.80 ( https://nmap.org ) at 2022-03-25 13:38 EDT
Nmap scan report for localhost (127.0.0.1)
Host is up (0.000086s latency).
Other addresses for localhost (not scanned): ::1

PORT      STATE SERVICE
11511/tcp open  unknown

Nmap done: 1 IP address (1 host up) scanned in 0.05 seconds

$ sudo netstat -tnlp | grep :11511
tcp        0      0 127.0.0.1:11511         0.0.0.0:*               LISTEN      3700/python3

$ ps 3700
    PID TTY      STAT   TIME COMMAND
   3700 pts/1    Sl     0:00 /usr/bin/python3 /opt/ros/foxy/bin/_ros2_daemon --rmw-implementation rmw_cyclonedds_cpp --ros-domain-id 0

But if I run the following Python code in the interpreter I'm seeing virtually the same error as with ros2...:

import xmlrpc.client
with xmlrpc.client.ServerProxy("http://localhost:11511/") as proxy:
     proxy.listMethods()

Result:

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/usr/lib/python3.8/xmlrpc/client.py", line 1109, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib/python3.8/xmlrpc/client.py", line 1450, in __request
    response = self.__transport.request(
  File "/usr/lib/python3.8/xmlrpc/client.py", line 1153, in request
    return self.single_request(host, handler, request_body, verbose)
  File "/usr/lib/python3.8/xmlrpc/client.py", line 1166, in single_request
    resp = http_conn.getresponse()
  File "/usr/lib/python3.8/http/client.py", line 1348, in getresponse
    response.begin()
  File "/usr/lib/python3.8/http/client.py", line 316, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python3.8/http/client.py", line 285, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response

On the outside chance it's a problem with ufw I've stopped & disabled the service:

sudo systemctl stop ufw
sudo systemctl disable ufw

No change.

I do have some necessary port-forwarding enabled, but nothing on port 11511:

$ sudo iptables-save
# Generated by iptables-save v1.8.4 on Fri Mar 25 13:42:10 2022
*filter
:INPUT ACCEPT [978655:543737458]
:FORWARD ACCEPT [6:1998]
:OUTPUT ACCEPT [827186:195629203]
COMMIT
# Completed on Fri Mar 25 13:42:10 2022
# Generated by iptables-save v1.8.4 on Fri Mar 25 13:42:10 2022
*nat
:PREROUTING ACCEPT [512:65118]
:INPUT ACCEPT [107:12635]
:OUTPUT ACCEPT [387:35844]
:POSTROUTING ACCEPT [0:0]
-A PREROUTING -p tcp -m tcp --dport 2000 -j DNAT --to-destination 10.252.252.1:2000
-A PREROUTING -p tcp -m tcp --dport 5000 -j DNAT --to-destination 10.252.252.1:5000
-A PREROUTING -p tcp -m tcp --dport 9091 -j DNAT --to-destination 10.252.252.1:9091
-A PREROUTING -p tcp -m tcp --dport 2001 -j DNAT --to-destination 10.252.252.1:2001
-A POSTROUTING -j MASQUERADE
COMMIT
# Completed on Fri Mar 25 13:42:10 2022

I even tried uninstalling the Foxy packages and installing Galactic instead. Same result; the ros2 daemon's XML-RPC server simply refuses connections. Starting the daemon as root also has no effect (again, not that I expected it would).

civerachb-cpr commented 2 years ago

Possible progress! I cleared my iptables configuration completely by running

#!/bin/bash
sudo sed -i 's/^#net.ipv4.ip_forward=.*/net.ipv4.ip_forward=0/g' /etc/sysctl.conf
sudo sed -i 's/^net.ipv4.ip_forward=1/net.ipv4.ip_forward=0/g' /etc/sysctl.conf
sudo sysctl net.ipv4.ip_forward=0
sudo iptables -P INPUT ACCEPT
sudo iptables -P FORWARD ACCEPT
sudo iptables -P OUTPUT ACCEPT
sudo iptables -t nat -F
sudo iptables -t mangle -F
sudo iptables -F
sudo iptables -X
sudo dpkg-reconfigure iptables-persistent

Then I started the ROS2 daemon, and ran ros2 topic list in between every one of the following commands:

REMOTE_HOST="10.252.252.1"
sudo sed -i 's/^#net.ipv4.ip_forward=.*/net.ipv4.ip_forward=1/g' /etc/sysctl.conf
sudo sed -i 's/^net.ipv4.ip_forward=0/net.ipv4.ip_forward=1/g' /etc/sysctl.conf
sudo sysctl net.ipv4.ip_forward=1
sudo iptables --policy FORWARD ACCEPT
sudo iptables -t nat -A PREROUTING -p tcp --dport 2000 -j DNAT --to-destination ${REMOTE_HOST}:2000
sudo iptables -t nat -A PREROUTING -p tcp --dport 2001 -j DNAT --to-destination ${REMOTE_HOST}:2001
sudo iptables -t nat -A PREROUTING -p tcp --dport 5000 -j DNAT --to-destination ${REMOTE_HOST}:5000
sudo iptables -t nat -A PREROUTING -p tcp --dport 9091 -j DNAT --to-destination ${REMOTE_HOST}:9091
sudo iptables -t nat -A POSTROUTING -j MASQUERADE
sudo dpkg-reconfigure iptables-persistent

As soon as I ran sudo iptables -t nat -A POSTROUTING -j MASQUERADE the Python error occurred again. So it looks like the issue has to do with enabling masquerade postrouting.

Anyone have any idea why masquerade postrouting would have this effect on the daemon?

civerachb-cpr commented 2 years ago

Smaller, isolated example. Installation procedure: 1) Install Ubuntu 20.04 + ros-foxy-ros-base package 2) Run the following script:

# start the ROS2 daemon
ros2 daemon start --debug

# check that you can list topics
ros2 daemon status
ros2 topic list

# enable port-forwarding. for the sake of example i'm just forwarding HTTP traffic to another server
# on my local network. replace the destination & port numbers as necessary
sudo sysctl net.ipv4.ip_forward=1
sudo iptables --policy FORWARD ACCEPT
sudo iptables -t nat -A PREROUTING -p tcp --dport 80 -j DNAT --to-destination 10.25.0.10:80
sudo iptables -t nat -A POSTROUTING -j MASQUERADE

# Use an external tool (web browser, ftp client, ssh client, etc... as necessary) to verify the port-forwarding is working

# try ros2 commands again:
ros2 daemon status          # should work
ros2 topic list             # should fail, see below for output
ros2 daemon stop            # should fail with same output

# disable port-forwarding again
sudo sysctl net.ipv4.ip_forward=0
sudo iptables -P INPUT ACCEPT
sudo iptables -P FORWARD ACCEPT
sudo iptables -P OUTPUT ACCEPT
sudo iptables -t nat -F
sudo iptables -t mangle -F
sudo iptables -F
sudo iptables -X

# try ros2 commands again:
ros2 daemon status          # still works
ros2 topic list             # works again
ros2 daemon stop            # works

When the ROS2 commands fail with port-forwarding enabled, I'm seeing this output:

Traceback (most recent call last):
  File "/opt/ros/foxy/bin/ros2", line 11, in <module>
    load_entry_point('ros2cli==0.9.11', 'console_scripts', 'ros2')()
  File "/opt/ros/foxy/lib/python3.8/site-packages/ros2cli/cli.py", line 67, in main
    rc = extension.main(parser=parser, args=args)
  File "/opt/ros/foxy/lib/python3.8/site-packages/ros2topic/command/topic.py", line 41, in main
    return extension.main(args=args)
  File "/opt/ros/foxy/lib/python3.8/site-packages/ros2topic/verb/list.py", line 38, in main
    with NodeStrategy(args) as node:
  File "/opt/ros/foxy/lib/python3.8/site-packages/ros2cli/node/strategy.py", line 52, in __enter__
    self._daemon_node.__enter__()
  File "/opt/ros/foxy/lib/python3.8/site-packages/ros2cli/node/daemon.py", line 116, in __enter__
    methods = self._proxy.system.listMethods()
  File "/usr/lib/python3.8/xmlrpc/client.py", line 1109, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib/python3.8/xmlrpc/client.py", line 1450, in __request
    response = self.__transport.request(
  File "/usr/lib/python3.8/xmlrpc/client.py", line 1153, in request
    return self.single_request(host, handler, request_body, verbose)
  File "/usr/lib/python3.8/xmlrpc/client.py", line 1165, in single_request
    http_conn = self.send_request(host, handler, request_body, verbose)
  File "/usr/lib/python3.8/xmlrpc/client.py", line 1278, in send_request
    self.send_content(connection, request_body)
  File "/usr/lib/python3.8/xmlrpc/client.py", line 1308, in send_content
    connection.endheaders(request_body)
  File "/usr/lib/python3.8/http/client.py", line 1251, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.8/http/client.py", line 1050, in _send_output
    self.send(chunk)
  File "/usr/lib/python3.8/http/client.py", line 972, in send
    self.sock.sendall(data)
BrokenPipeError: [Errno 32] Broken pipe

Upon removing the port-forwarding that error goes away immediately.

I'm reliably able to prevent the ROS2 Daemon from responding by enabling basic port-forwarding. I'm not sure if this is a bug with ros2cli itself, or something upstream with Python's XML-RPC library.

v-lopez commented 2 years ago

I am experiencing this as well, easily to workaround with: sudo iptables -t nat -F

And to reproduce again with: sudo iptables -t nat -A POSTROUTING -j MASQUERADE

For me disabling masquerading is not an option and this is a complete show stopper.

@fujitatomoya if running ros2 daemon with debug mode, you'll see that it does not deceive any call such as get_topic_names_and_types().

clalancette commented 2 years ago

I did a bit of research here. The earlier examples give some clues, but there is actually a much easier way to reproduce this without (directly) involving ROS 2 at all. Here are two example programs:

xmlrpc-client.py:

import xmlrpc.client
with xmlrpc.client.ServerProxy("http://127.0.01:11511/ros2cli/") as proxy:
     print(proxy.system.listMethods())

xmlrpc-server.py:

import socket
import struct
# Import SimpleXMLRPCRequestHandler to re-export it.
from xmlrpc.server import SimpleXMLRPCRequestHandler  # noqa
from xmlrpc.server import SimpleXMLRPCServer

class LocalXMLRPCServer(SimpleXMLRPCServer):

    allow_reuse_address = False

    def server_bind(self):
        # Prevent listening socket from lingering in TIME_WAIT state after close()
        self.socket.setsockopt(
            socket.SOL_SOCKET, socket.SO_LINGER, struct.pack('ii', 1, 0))
        super(LocalXMLRPCServer, self).server_bind()

    def get_request(self):
        # Prevent accepted socket from lingering in TIME_WAIT state after close()
        sock, addr = super(LocalXMLRPCServer, self).get_request()
        sock.setsockopt(
            socket.SOL_SOCKET, socket.SO_LINGER, struct.pack('ii', 1, 0))
        return sock, addr

    def verify_request(self, request, client_address):
        print("verifying request, address is:", client_address)
        if client_address[0] != '127.0.0.1':
            return False
        return super(LocalXMLRPCServer, self).verify_request(request, client_address)

class RequestHandler(SimpleXMLRPCRequestHandler):
    rpc_paths = ('/ros2cli/',)

server = LocalXMLRPCServer(('127.0.0.1', 11511), logRequests=False, requestHandler=RequestHandler, allow_none=True)

server.register_introspection_functions()

shutdown = False
def shutdown_handler():
    global shutdown
    print('Remote shutdown requested')
    shutdown = True
server.register_function(shutdown_handler, 'system.shutdown')

while not shutdown:
    server.handle_request()

(the server is probably not the minimum example, but it is good enough for the example here)

If I run xmlrpc-server.py in one terminal, and then run xmlrpc-client.py in a separate terminal with a clear NAT iptables table, things work just fine and I get a method listing. If I then enable NAT by running sudo iptables -t nat -A POSTROUTING -j MASQUERADE, I then get an exception trying to connect.

Looking closely, the problem is in the server verify_request function (which I copied out of https://github.com/ros2/ros2cli/blob/f4e5952f430e502060594d68f3a050b785fd4249/ros2cli/ros2cli/xmlrpc/local_server.py#L39-L42). It is looking to see if the request is coming from 127.0.0.1, and rejecting it otherwise. The problem is that when MASQUERADING is on, those requests actually come from a different address on the local machine (192.168.20.72 in my case). So now we know exactly what the problem is.

The solution is a bit trickier. I guess we could consider just removing that check completely; presumably the fact that we only bind to 127.0.0.1 is good enough to keep external users from querying the daemon. As a maybe slightly safer option, we could attempt to enumerate all local interfaces and allow the request to come from all of them. Not sure how we would do that on Windows, though. This needs a bit more thought.

civerachb-cpr commented 2 years ago

Thanks for the update! I'm glad the source of the problem has at least been identified, even if the fix is more complicated.

In my specific instance I was able to bypass needing to use iptables by installing apache2 and configuring it to act as a proxy server for the resources I would otherwise have been port-forwarding. Not ideal, but it worked in my specific use-case.

sloretz commented 2 years ago

The solution is a bit trickier.

Brainstorming crazy ideas ok? What if we avoided the network and used SharedMemory to communicate with the daemon?

A different idea with shared memory: What we generated a secret value, stored it in shared memory, and included that in the request to the daemon? The daemon would only process the request if it contained the right value.

Not sure how we would do that on Windows

Is the problem relevant on Windows? I can't say I understand how masquerading works, but I only see references to it on BSD and Linux.

clalancette commented 2 years ago

Brainstorming crazy ideas ok?

Absolutely!

What if we avoided the network and used SharedMemory to communicate with the daemon?

I generally like this idea; it should get us away from XMLRPC (which has other problems besides this), and should also be faster. I guess we'd have to see how it worked on Windows, but assuming it worked there I'd be for that.

Is the problem relevant on Windows? I can't say I understand how masquerading works, but I only see references to it on BSD and Linux.

As you say, MASQUERADING as such doesn't exist on Windows. But the more general problem of the source IP address != '127.0.0.1' can exist on any TCP connection, so presumably we could construct a similar problem on Windows (though I have no idea how to do so).

fujitatomoya commented 2 years ago

we could attempt to enumerate all local interfaces and allow the request to come from all of them. Not sure how we would do that on Windows

the following does not work on windows? since i do not use windows at all, i am not sure either.

import netifaces

interfaces = netifaces.interfaces()
for interface in interfaces:
    addrs = netifaces.ifaddresses(interface)
    if netifaces.AF_INET in addrs.keys():
        for value in addrs[netifaces.AF_INET]:
            print(value['addr'])
fujitatomoya commented 2 years ago

@civerachb-cpr it would be really helpful if you try https://github.com/ros2/ros2cli/pull/729 to fix your problem?

civerachb-cpr commented 2 years ago

I'll give that a try when I have the time. I'm a little swamped with other work right now, but I'll do my best to test it out in the next week or two. Sorry I can't test it right away.