Open amalnanavati opened 7 months ago
That sounds very much like Cyclone DDS switching to multicast to avoid having sending the data multiple times (once for each subscriber) and all multicasts getting forwarded to the physical network.
There are two possible reasons for forwarding the multicasts to the network:
The first is easily fixed by enabling IGMP snooping, the second is easily fixed by configuring the used multicast addresses differently (Discovery/DefaultMulticastAddress
in the Cyclone DDS XML is the easiest).
Another approach is of course to not use multicast at all for data, that's most easily done by setting General/AllowMulticast
to spdp
(the default if the machine is using a WiFi adapter). Incidentally, setting Discovery/DefaultMulticastAddress
to 0.0.0.0
has the same effect but still allows you to configure multicast for specific topics.
Yet another route is to configure it to use both the loopback interface and the ethernet interface. IINM all else being equal it prioritises loopback over ethernet, and so if the loopback interface supports multicast it will do that. The issue with this route is that Linux has a quirk where an IPv4 loopback interface will happily perform multicasts, but the "multicast" flag isn't set on the interface. You can set that by hand, or you can use an override in the Cyclone configuration. (Cyclone master
is a bit more helpful, but not 100% compatible with ROS 2 at the moment. This I am trying to fix asap.)
And with regards to the differences with FastRTPS: different DDS implementations make different choices w.r.t. the use of multicast, selecting interfaces to use, and choosing addresses to send data to when multiple options exist. Cyclone is a bit unusual in that it uses only one interface by default and that it really likes to use multicast.
The one-interface-by-default may not be ideal in the "modern" context with containers, virtual network interfaces and oftentimes multi-homed machines (wifi+wired), because it means it sometimes doesn't work out-of-the-box. The other side of the coin is that at least it avoids spraying junk all over the networks when most systems really want to use a very specific subset of the network interfaces, but that this typically stays hidden for a long time.
For multicast the biggest issues are that switches drop it more quickly than they drop unicast, that it works very badly with WiFi and that IP multicast typically ends up anywhere there's a DDS process running (if one sticks to the default addresses), even when those processes are unrelated. So making it less a bet less eager to switch to multicast is probably wise.
This response is super helpful, thank you so much for the details!
For multicast the biggest issues are that switches drop it more quickly than they drop unicast
This is precisely the issue we had that motivated me to look into WiFi throughout, leading to the above issue. Our router was dropping clients, and it’s helpful to understand that multicast could be the reason.
For what it’s worth, I did enable multicast on loopback manually, and still got the above results. It seemed to me that Cyclone was preferring Ethernet as its “one interface by default.”
It sounds like getting CycloneDDS to work in this scenario would require more careful configuration of the XML file. This issue is not high-priority for me as FastRTPS seems (for now) to be working in my scenario. However, if/when I switch back to CycloneDDS, I’ll follow your pointers on tuning the XML file. Thanks for the detailed pointers and explanation!
Yes, it defaults to a single interface and prefers Ethernet ... it takes some more attention on the configuration.
If you'd be interested in checking that it works as you expect when you simply avoid multicast, it should be sufficient to do export CYCLONEDDS_URI="<Disc><DefaultMulticastAddress>0.0.0.0</></>"
before starting the applications. You don't actually need to have an XML file somewhere, you can do a shorthand directly in the environment variable.
I'm sure you understand I would prefer it if you use Cyclone DDS 😂 But if you're happy with FastRTPS then go do fun things instead of tweaking middleware!
I started looking into this more, and here is what I found:
export CYCLONEDDS_URI="<Disc><DefaultMulticastAddress>0.0.0.0</></>"
. This worked in terms of preventing packets from unnecessarily being sent through the network. But did I understand you correctly that doing so essentially disables multicast?I'd ideally like the following behavior: use multicast on both loopback and ethernet, but prefer loopback and only use ethernet if necessary (e.g., if the communication is going across devices)
. This is driven by our application details, where we have many subscribers per publisher, and most publisher-subscriber pairs are on the same machine, but a few cross machines (requiring ethernet).
Do you have suggestions on achieving the above behavior with a CycloneDDS config file? I'm also looking through the docs, but any suggestions you have would be much appreciated. Currently, my config file is pretty barebones:
<?xml version="1.0" encoding="utf-8"?>
<CycloneDDS
xmlns="https://cdds.io/config"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="https://cdds.io/config https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/master/etc/cyclonedds.xsd"
>
<Domain Id="any">
<Discovery>
<DefaultMulticastAddress>0.0.0.0</DefaultMulticastAddress>
</Discovery>
</Domain>
</CycloneDDS>
(As a side-note, although FastRTPS addressed the issue of our network dropping clients, it introduced a new issue where sometimes there would be high latency in received small-sized messages. In our application we have lots of publishers/subscribers operating at 15-30Hz, so lots of messages are being communicated. I assume the FastRTPS issue might be due to it not using multicast, and you had mentioned that CycloneDDS strongly prefers multicast, so I figured I'd return to the CycloneDDS world and try to configure it correctly 😄 )
@eboasson It looks like there's been a good discussion here. Is this one resolved or does it need more work? 🧇
@eboasson Any pointers on what parameters to set to get the behavior I specified in my above message? Thanks in advance!
I am so hopelessly behind on everything RMW, and ever so slightly less behind on everything Cyclone ... Thankfully you just kept pinging me ☺️🙏
I tried your suggestion of doing export CYCLONEDDS_URI="
0.0.0.0</></>". This worked in terms of preventing packets from unnecessarily being sent through the network. But did I understand you correctly that doing so essentially disables multicast?
Yes, but there's "disabling" and "really disabling". This means there is no default multicast address anymore, which has two effects:
*.*
(first is DDS partition, second is DDS topic) to 239.255.0.1
or whatever multicast address you'd like to use. But you can also map only some topics to address 1, some topics to address 2 and leave everything else to use multicast. (The first match wins.)The terminology is ancient and inherited from OpenSplice DDS's original native protocol stack (one that antedates the DDSI protocol), but if you can live with that then something like:
<Discovery>
<DefaultMulticastAddress>0.0.0.0</DefaultMulticastAddress>
</Discovery>
<Partitioning>
<NetworkPartitions>
<NetworkPartition Name="multi" Address="239.255.0.2"/>
</NetworkPartitions>
<PartitionMappings>
<PartitionMapping DCPSPartitionTopic="*.*low_rate*" NetworkPartition="multi"/>
</PartitionMappings>
</Partitioning>
to map all traffic with topics containing "low_rate" in the name to 239.255.0.2 and leave everything else at unicast.
You can't currently write exclusions, so there's currently no way to say "everything but this topic". I think it would be really easy to improve it to do exclusions and to allow specifying that it has to map to unicast only. It's just not there today.
I'd ideally like the following behavior: use multicast on both loopback and ethernet, but prefer loopback and only use ethernet if necessary (e.g., if the communication is going across devices). This is driven by our application details, where we have many subscribers per publisher, and most publisher-subscriber pairs are on the same machine, but a few cross machines (requiring ethernet).
I think that requires no more than:
<General>
<Interfaces>
<NetworkInterface name="lo0" priority="default" multicast="true"/>
<NetworkInterface name="en0" priority="default"/>
</Interfaces>
</General>
(macOS network interface names, but you get the idea.) The multicast="true"
overrides whatever the kernel says and should work on Linux and is not needed if you have set the MULTICAST
flag on the interface. I have to admit there's a slight risk I am confusing behaviour in master
with behaviour in 0.10.x
and in any case on master
it would figure out that multicast over loopback works fine without any configuration options.
(If you want to use Cyclone DDS master
with ROS, then https://github.com/eboasson/rmw_cyclonedds/tree/cdds-master will probably do the trick. I intend to do a PR after I dot the i's and cross the t's ... It has some nice side benefits, like integrating type discovery in ROS with type discovery in DDS.)
Cyclone considers loopback to be lower cost than Ethernet so this should automatically result in all traffic using loopback only when possible. If everything works the way it should, doing:
<NetworkInterface name="en0" priority="default" allow_multicast="false"/>
should prevent the use of multicast over ethernet while allowing it over loopback. (I'm not sure it will work as intended, especially not in 0.10.x.)
Hope this is still useful after so much time!
Bug report
Overview: While monitoring the LAN in/out throughput on my router, I launch a publisher on my local machine, followed by one subscriber (
ros2 topic echo ...
), followed by a second subscriber (ros2 topic echo ...
). As soon as I launch the second subscriber, my router registers packets going through the LAN corresponding to the throughput of the topic, even though all publishers and subscribers are running on the local machine.Required Info:
Steps to reproduce issue
Reproducing this issue requires a way to monitor router LAN throughput; I use NetCloud OS for the Cradlepoint IBR900-600M router. Additionally, this issue is most pronounced when the topic has high throughput; thus, for the publisher I use the DummyRealSense node from this package, and its raw aligned color to depth image.
While monitoring LAN in/out throughout of your router, do:
ros2 daemon stop
export RMW_IMPLEMENTATION=rmw_cyclonedds_cpp
ros2 launch feeding_web_app_ros2_test feeding_web_app_dummy_nodes_launch.xml run_web_bridge:=false run_food_detection:=false run_face_detection:=false run_food_on_fork_detection:=false run_motion:=false
ros2 topic echo /camera/aligned_depth_to_color/image_raw | grep sec
ros2 topic echo /camera/aligned_depth_to_color/image_raw | grep sec
ros2 daemon stop
export RMW_IMPLEMENTATION=rmw_fastrtps_cpp
Expected behavior
LAN in/out throughput should be near 0 Mbps the entire time.
Actual behavior
While using CycloneDDS, after we launch the second subscriber, the LAN throughput jumps up to the throughput of the topic (~70 Mbps). Terminating the second subscriber brings it back down. This issue does not happen with FastRTPS. See the below screenshot for LAN throughput with annotated timestamps corresponding to the above steps.
Additional information