Open doisyg opened 3 years ago
i think ros2 daemon already running and caches endpoints via discovery. and ros2 node list
will ask for node list to daemon via xmlrpc, then daemon returns cached list.
### Host-A
# just in case, clear that out.
root@f8a93cb8cfbd:~# unset ROS_LOCALHOST_ONLY
# start publisher
root@f8a93cb8cfbd:~# ros2 run demo_nodes_cpp talker
[INFO] [1619075202.150248497] [talker]: Publishing: 'Hello World: 1'
...
### Host-B
# set env variable only to have localhost network
root@24c44a10b658:~# export ROS_LOCALHOST_ONLY=1
# print node list
root@24c44a10b658:~# ros2 node list
/talker ---------------> problem confirmed.
# restart daemon to clear discovery cache.
root@24c44a10b658:~# ros2 daemon stop
The daemon has been stopped
root@24c44a10b658:~# ros2 daemon start
The daemon has been started
# list node again
root@24c44a10b658:~# ros2 node list
@doisyg could you check my previous comment? and if you still have problem, let us know 😃
Hi @fujitatomoya, So when I noticed the issue, I had no control on host/hosts A (but was on the same network), so I don't know exactly what was running on them. The machine I controlled was host B and yes, even after stopping manually the daemon (or rebooted), the problem persisted.
I cannot reproduce it with 2 machines that I can control and the talker example. I know it is a fuzzy report, but I am almost certain that there is an issue as I noticed it a couple of times. And it disturbs our setup enough that we all resorted using ROS_DOMAIN_ID. What else can I run the next time I notice the issue (I have to be in a shared office with other ros2 devs) ? Is there any way of knowing from which ip the "phantom nodes" are coming ?
even after stopping manually the daemon (or rebooted), the problem persisted.
okay...
I know it is a fuzzy report, but I am almost certain that there is an issue as I noticed it a couple of times.
i am not saying this is no problem, and this sometimes happens... 😢 but w/o reproducible procedure, it would be really hard to debug.
Is there any way of knowing from which ip the "phantom nodes" are coming ?
i think getting IP address requires the debug information from dds (rmw implementation, cyclone or fastdds), which i am not sure how to do that...
could be related to https://github.com/ros2/rmw_cyclonedds/issues/311
could be related to ros2/rmw_cyclonedds#311
I doubt it — that would more likely than not cause it to not work at all.
i think getting IP address requires the debug information from dds (rmw implementation, cyclone or fastdds), which i am not sure how to do that...
For Cyclone, the quickest route is usually still to enable (discovery) tracing: if you set the CYCLONEDDS_URI
environment variable to an XML file containing
<?xml version="1.0" encoding="UTF-8" ?>
<CycloneDDS xmlns="https://cdds.io/config" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="https://cdds.io/config https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/master/etc/cyclonedds.xsd">
<Domain id="any">
<Tracing>
<Verbosity>fine</Verbosity>
<OutputFile>cdds.log.${CYCLONEDDS_PID}</OutputFile>
</Tracing>
</Domain>
</CycloneDDS>
you'll get a text file with tons of details. Do look for lines matching the regex SPDP.*NEW
; when in doubt, I'll be happy to help. (CYCLONEDDS_URI
is really a comma-separated list of files, URIs (only file://
for now) and configuration fragments, so if you already have a file, you can edit it or you can add another one; or, if you are like me and lazy, you can copy-paste an abbreviated form into CYCLONEDDS_URI
directly: <Tr><V>fine</><Out>cdds.log.${CYCLONEDDS_PID}</>
in the environment variable will do exactly the same).
The API makes discovery information available via built-in topics, and I do intend to add IP addresses to that data. Especially now that it has become really accessible (e.g. https://github.com/eclipse-cyclonedds/cyclonedds-python/blob/master/src/cyclonedds/tools/ddsls.py) that is the way to do these things. For now, however, the traces are best (or wireshark, I suppose).
P.S. ROS_LOCALHOST_ONLY
causes it to use the loopback interface, that is:
One would expect that this would keep it isolated from the rest of the network, but if nonetheless receives a participant discovery packet (i.e., the message that bootstraps the DDS discovery mechanism) from another machine, then things become a little tricky because the version of "6 days ago" (as GitHub so helpfully pretty-prints the date) will not discard it.
I think re-trying it with the latest fixes that (though I would wait for https://github.com/eclipse-cyclonedds/cyclonedds/pull/774 to be merged, which should be real soon), but I haven't specifically tried it.
Thanks @eboasson for the detailed answer. I will then wait for the lastest fixes and report here
One would expect that this would keep it isolated from the rest of the network, but if nonetheless receives a participant discovery packet (i.e., the message that bootstraps the DDS discovery mechanism) from another machine, then things become a little tricky because the version of "6 days ago" (as GitHub so helpfully pretty-prints the date) will not discard it.
That would explain what I am seeing
We have also experienced the same thing on ROS 2 humble
.
The best way for us to stop communication was to:
cyclonedds.xml
with:
<Interfaces>
<NetworkInterface name="lo" priority="default" multicast="default" />
</Interfaces>
export ROS_LOCALHOST_ONLY=1
line from .bashrc
ros2 daemon stop
sudo ip link set lo multicast on
Also in order to check the if you are part of a most likely DDS activity, you can use Wireshark by applying the rtps
as a filter.
In large office networks, this can be useful. In our office, once everyone has applied these steps, the rtps
traffic has reduced to zero for us.
Bug report
Required Info:
Steps to reproduce issue
Connect you machine to a network with multiple other machines running ROS2
Expected behavior
With
export ROS_LOCALHOST_ONLY=1
, no nodes should be listed if nothing runs on your machineActual behavior
I get multiple node listed, several with the exact same name
Additional information
If using
export ROS_DOMAIN_ID='unique_id_on_the_netwok'
, no nodes are listed. If switching off my wifi interface, no nodes are listed. If connecting to another wifi network (with no other ROS2 machines), no nodes are listed.I would expect to be isolated in the same way and not seeing any difference between using
ROS_LOCALHOST_ONLY=1 and using
ROS_DOMAIN_ID='unique_id_on_the_netwok'`