ros2 / rmw_cyclonedds

ROS 2 RMW layer for Eclipse Cyclone DDS
Apache License 2.0
118 stars 90 forks source link

ROS_LOCALHOST_ONLY is not preventing cross talking between machines #370

Open doisyg opened 3 years ago

doisyg commented 3 years ago

Bug report

Required Info:

Steps to reproduce issue

Connect you machine to a network with multiple other machines running ROS2

ros2 node list

Expected behavior

With export ROS_LOCALHOST_ONLY=1, no nodes should be listed if nothing runs on your machine

Actual behavior

I get multiple node listed, several with the exact same name

Additional information

If using export ROS_DOMAIN_ID='unique_id_on_the_netwok', no nodes are listed. If switching off my wifi interface, no nodes are listed. If connecting to another wifi network (with no other ROS2 machines), no nodes are listed.

I would expect to be isolated in the same way and not seeing any difference between using ROS_LOCALHOST_ONLY=1 and usingROS_DOMAIN_ID='unique_id_on_the_netwok'`

fujitatomoya commented 3 years ago

i think ros2 daemon already running and caches endpoints via discovery. and ros2 node list will ask for node list to daemon via xmlrpc, then daemon returns cached list.

### Host-A
# just in case, clear that out.
root@f8a93cb8cfbd:~# unset ROS_LOCALHOST_ONLY
# start publisher
root@f8a93cb8cfbd:~# ros2 run demo_nodes_cpp talker
[INFO] [1619075202.150248497] [talker]: Publishing: 'Hello World: 1'
...

### Host-B
# set env variable only to have localhost network
root@24c44a10b658:~# export ROS_LOCALHOST_ONLY=1
# print node list
root@24c44a10b658:~# ros2 node list
/talker               ---------------> problem confirmed.
# restart daemon to clear discovery cache.
root@24c44a10b658:~# ros2 daemon stop
The daemon has been stopped
root@24c44a10b658:~# ros2 daemon start
The daemon has been started
# list node again
root@24c44a10b658:~# ros2 node list
fujitatomoya commented 3 years ago

@doisyg could you check my previous comment? and if you still have problem, let us know 😃

doisyg commented 3 years ago

Hi @fujitatomoya, So when I noticed the issue, I had no control on host/hosts A (but was on the same network), so I don't know exactly what was running on them. The machine I controlled was host B and yes, even after stopping manually the daemon (or rebooted), the problem persisted.

I cannot reproduce it with 2 machines that I can control and the talker example. I know it is a fuzzy report, but I am almost certain that there is an issue as I noticed it a couple of times. And it disturbs our setup enough that we all resorted using ROS_DOMAIN_ID. What else can I run the next time I notice the issue (I have to be in a shared office with other ros2 devs) ? Is there any way of knowing from which ip the "phantom nodes" are coming ?

fujitatomoya commented 3 years ago

even after stopping manually the daemon (or rebooted), the problem persisted.

okay...

I know it is a fuzzy report, but I am almost certain that there is an issue as I noticed it a couple of times.

i am not saying this is no problem, and this sometimes happens... 😢 but w/o reproducible procedure, it would be really hard to debug.

Is there any way of knowing from which ip the "phantom nodes" are coming ?

i think getting IP address requires the debug information from dds (rmw implementation, cyclone or fastdds), which i am not sure how to do that...

fujitatomoya commented 3 years ago

could be related to https://github.com/ros2/rmw_cyclonedds/issues/311

eboasson commented 3 years ago

could be related to ros2/rmw_cyclonedds#311

I doubt it — that would more likely than not cause it to not work at all.

i think getting IP address requires the debug information from dds (rmw implementation, cyclone or fastdds), which i am not sure how to do that...

For Cyclone, the quickest route is usually still to enable (discovery) tracing: if you set the CYCLONEDDS_URI environment variable to an XML file containing

<?xml version="1.0" encoding="UTF-8" ?>
<CycloneDDS xmlns="https://cdds.io/config" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="https://cdds.io/config https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/master/etc/cyclonedds.xsd">
    <Domain id="any">
        <Tracing>
            <Verbosity>fine</Verbosity>
            <OutputFile>cdds.log.${CYCLONEDDS_PID}</OutputFile>
        </Tracing>
    </Domain>
</CycloneDDS>

you'll get a text file with tons of details. Do look for lines matching the regex SPDP.*NEW; when in doubt, I'll be happy to help. (CYCLONEDDS_URI is really a comma-separated list of files, URIs (only file:// for now) and configuration fragments, so if you already have a file, you can edit it or you can add another one; or, if you are like me and lazy, you can copy-paste an abbreviated form into CYCLONEDDS_URI directly: <Tr><V>fine</><Out>cdds.log.${CYCLONEDDS_PID}</> in the environment variable will do exactly the same).

The API makes discovery information available via built-in topics, and I do intend to add IP addresses to that data. Especially now that it has become really accessible (e.g. https://github.com/eclipse-cyclonedds/cyclonedds-python/blob/master/src/cyclonedds/tools/ddsls.py) that is the way to do these things. For now, however, the traces are best (or wireshark, I suppose).

eboasson commented 3 years ago

P.S. ROS_LOCALHOST_ONLY causes it to use the loopback interface, that is:

One would expect that this would keep it isolated from the rest of the network, but if nonetheless receives a participant discovery packet (i.e., the message that bootstraps the DDS discovery mechanism) from another machine, then things become a little tricky because the version of "6 days ago" (as GitHub so helpfully pretty-prints the date) will not discard it.

I think re-trying it with the latest fixes that (though I would wait for https://github.com/eclipse-cyclonedds/cyclonedds/pull/774 to be merged, which should be real soon), but I haven't specifically tried it.

doisyg commented 3 years ago

Thanks @eboasson for the detailed answer. I will then wait for the lastest fixes and report here

One would expect that this would keep it isolated from the rest of the network, but if nonetheless receives a participant discovery packet (i.e., the message that bootstraps the DDS discovery mechanism) from another machine, then things become a little tricky because the version of "6 days ago" (as GitHub so helpfully pretty-prints the date) will not discard it.

That would explain what I am seeing

xmfcx commented 1 year ago

We have also experienced the same thing on ROS 2 humble.

The best way for us to stop communication was to:

Also in order to check the if you are part of a most likely DDS activity, you can use Wireshark by applying the rtps as a filter.

In large office networks, this can be useful. In our office, once everyone has applied these steps, the rtps traffic has reduced to zero for us.