osrf / rmf_core

Provides the centralized functions of RMF: scheduling, etc.
Apache License 2.0
102 stars 41 forks source link

Implementing using CycloneDDS instead of FastRTPS in RMF #294

Closed destkk closed 3 years ago

destkk commented 3 years ago

Ubuntu 20.04

Hi everyone, has anyone tried implementing CycloneDDS with RMF instead of default FastDDS?

I have followed the steps to install all necessary files for CycloneDDS together with communication testing locally/across servers through CLI but now I am unable to launch gazebo with RMF demos launch file.

Part of the error code is shown below,

[building_map_server-3] 1612940156.042290 [0] building_m: Failed to find a free participant index for domain 0
[building_map_server-3] [ERROR] [1612940156.042372026] [rmw_cyclonedds_cpp]: rmw_create_node: failed to create domain, error Error
[building_map_server-3] 
[building_map_server-3] >>> [rcutils|error_handling.c:108] rcutils_set_error_state()
[building_map_server-3] This error state is being overwritten:
[building_map_server-3] 
[building_map_server-3]   'error not set, at /tmp/binarydeb/ros-foxy-rcl-1.1.10/src/rcl/node.c:276'
[building_map_server-3] 
[building_map_server-3] with this new error message:
[building_map_server-3] 
[building_map_server-3]   'rcl node's rmw handle is invalid, at /tmp/binarydeb/ros-foxy-rcl-1.1.10/src/rcl/node.c:428'
[gzserver-13] 1612940156.138800 [0]   gzserver: Failed to find a free participant index for domain 0
[gzserver-13] 1612940156.138800 [0]   gzserver: Failed to find a free participant index for domain 0
[gzserver-13] [ERROR] [1612940156.139039182] [rmw_cyclonedds_cpp]: rmw_create_node: failed to create domain, error Error
[gzserver-13] 
[gzserver-13] >>> [rcutils|error_handling.c:108] rcutils_set_error_state()
[gzserver-13] This error state is being overwritten:
[gzserver-13] 
[gzserver-13]   'error not set, at /tmp/binarydeb/ros-foxy-rcl-1.1.10/src/rcl/node.c:276'
[gzserver-13] 
[gzserver-13] with this new error message:
[gzserver-13] 
[gzserver-13]   'rcl node's rmw handle is invalid, at /tmp/binarydeb/ros-foxy-rcl-1.1.10/src/rcl/node.c:428'
[gzserver-13] 
[gzserver-13] rcutils_reset_error() should be called after error handling to avoid this.
[gzserver-13] <<<
[gzserver-13] [ERROR] [1612940156.139215831] [rcl]: Failed to fini publisher for node: 1
[gzserver-13] terminate called after throwing an instance of 'rclcpp::exceptions::RCLError'
mxgrey commented 3 years ago

We definitely have users who run RMF on CycloneDDS without a problem.

The only suggestion I can think of would be to make sure that all of your terminals have the environment variable RMW_IMPLEMENTATION=rmw_cyclonedds_cpp set. Since FastDDS is the default RMW implementation, any terminals without that environment variable set will try to use FastDDS, and I have definitely seen issues arise when trying to mix different RMW implementations.

destkk commented 3 years ago

We definitely have users who run RMF on CycloneDDS without a problem.

The only suggestion I can think of would be to make sure that all of your terminals have the environment variable RMW_IMPLEMENTATION=rmw_cyclonedds_cpp set. Since FastDDS is the default RMW implementation, any terminals without that environment variable set will try to use FastDDS, and I have definitely seen issues arise when trying to mix different RMW implementations.

Hi there. Thank you for the prompt response. I have run this RMW_IMPLEMENTATION=rmw_cyclonedds_cpp to make sure it is running in CycloneDDS but I'm facing the issue above when launching gazebo with RMF launch files.

Yadunund commented 3 years ago

To clarify, the command to run on every terminal is export RMW_IMPLEMENTATION=rmw_cyclonedds_cpp. Then you can type echo $RMW_IMPLEMENTATION in the same terminals to check that the env variable has been set (You should see rmw_cyclonedds_cpp printed out)

I have been running all the scenarios in rmf_demos on CycloneDDS successfully.

destkk commented 3 years ago

To clarify, the command to run on every terminal is export RMW_IMPLEMENTATION=rmw_cyclonedds_cpp. Then you can type echo $RMW_IMPLEMENTATION in the same terminals to check that the env variable has been set (You should see rmw_cyclonedds_cpp printed out)

I have been running all the scenarios in rmf_demos on CycloneDDS successfully.

Hi there. Thank you for the advice. I have tried to run the command export RMW_IMPLEMENTATION=rmw_cyclonedds_cpp in every terminal and by typing echo $RMW_IMPLEMENTATION yet i could not see the line printed out.

I have inserted export RMW_IMPLEMENTATION=rmw_cyclonedds_cpp into .bashrc and by checking using ros2 doctor --report i could see that the system has already set

RMW MIDDLEWARE
middleware name    : rmw_cyclonedds_cpp

May I know are there any ways that I could troubleshoot this issue? My apologies as I am still new and exploring the use of CycloneDDS.

destkk commented 3 years ago

To add on, I have implemented CycloneDDS using Husarnet. The following steps i followed is based on the link below, https://husarion.com/tutorials/other-tutorials/husarnet-cyclone-dds

I am able to establish communication over CycloneDDS across different networks. Tested Node to Node, Node to CLI for topic and services and it is working fine. Just that the current issue is launching the demos in gazebo thus I am not sure whether is this the correct method to implement it or is there other files that I would need to include or edit in order to make it work.

mxgrey commented 3 years ago

If you're building rmw_cyclonedds from source, then my best guess at this point would be that something in the source code of the latest version of rmw_cyclonedds is not compatible with some feature that the RMF demo is using. That or maybe the CycloneDDS configuration that Husarnet is recommending is not compatible with some feature(s) that RMF is using.

As the maintainers of RMF, when we use CycloneDDS, we are installing the prebuilt package ros-foxy-rmw-cyclonedds-cpp and running it without any configuration file. I would suggest trying that simpler approach to see if it allows the demos to work. If the issue is being caused by a different version or different configuration of CycloneDDS, then I would suggest seeking help from https://github.com/ros2/rmw_cyclonedds and/or https://github.com/eclipse-cyclonedds/cyclonedds.

destkk commented 3 years ago

To clarify, the command to run on every terminal is export RMW_IMPLEMENTATION=rmw_cyclonedds_cpp. Then you can type echo $RMW_IMPLEMENTATION in the same terminals to check that the env variable has been set (You should see rmw_cyclonedds_cpp printed out)

I have been running all the scenarios in rmf_demos on CycloneDDS successfully.

Hi can I check all the scenarios that you have been running on CycloneDDS, are you able to send a service command or any other commands across different network? Sending and receive from 2 different terminals both connecting to different network.

destkk commented 3 years ago

If you're building rmw_cyclonedds from source, then my best guess at this point would be that something in the source code of the latest version of rmw_cyclonedds is not compatible with some feature that the RMF demo is using. That or maybe the CycloneDDS configuration that Husarnet is recommending is not compatible with some feature(s) that RMF is using.

As the maintainers of RMF, when we use CycloneDDS, we are installing the prebuilt package ros-foxy-rmw-cyclonedds-cpp and running it without any configuration file. I would suggest trying that simpler approach to see if it allows the demos to work. If the issue is being caused by a different version or different configuration of CycloneDDS, then I would suggest seeking help from https://github.com/ros2/rmw_cyclonedds and/or https://github.com/eclipse-cyclonedds/cyclonedds.

Hi there. Yes it works without any configuration file. Once I have added the configuration file I'm unable to launch the demo in gazebo. Probably there is incompatibility for some feature(s) of RMF.

Below is the configuration file,

<?xml version="1.0" encoding="UTF-8" ?>
<CycloneDDS xmlns="https://cdds.io/config" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="https://cdds.io/config https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/master/etc/cyclonedds.xsd">
    <Domain id="any">
        <General>
            <NetworkInterfaceAddress>auto</NetworkInterfaceAddress>
            <AllowMulticast>false</AllowMulticast>
            <MaxMessageSize>65500B</MaxMessageSize>
        <FragmentSize>4000B</FragmentSize>
        <Transport>udp6</Transport>
        </General>
    <Discovery>
        <Peers>
            <Peer address="IPV6"/>
            <Peer address="IPV6"/>
        </Peers>
        <ParticipantIndex>auto</ParticipantIndex>
    </Discovery>
        <Internal>
            <Watermarks>
                <WhcHigh>500kB</WhcHigh>
            </Watermarks>
        </Internal>
        <Tracing>
            <Verbosity>severe</Verbosity>
            <OutputFile>stdout</OutputFile>
        </Tracing>
    </Domain>
</CycloneDDS>
mxgrey commented 3 years ago

I'm afraid we don't really have the time or resources to debug specific configurations of the underlying DDS. I'm glad that the problem has been narrowed down to a problem with the configuration being incompatible, but I'm afraid that's as far as we're able to provide free community support.

I'll be closing this issue as it's outside our scope, but feel free to follow up if you are able to contribute a non-breaking change that allows RMF to work with the desired configuration.

codebot commented 3 years ago

Yeah, I haven't ventured yet into configuring Cyclone in a non-default manner like this. What jumps out at me though are the lines that have been copied but not filled in from the Husarion documentation like this:

  <Peer address="[IPV6-address]"/>

I would expect that those lines are intended to have valid ipv6 addresses in them, not IPV6

destkk commented 3 years ago

Yeah, I haven't ventured yet into configuring Cyclone in a non-default manner like this. What jumps out at me though are the lines that have been copied but not filled in from the Husarion documentation like this:

  <Peer address="[IPV6-address]"/>

I would expect that those lines are intended to have valid ipv6 addresses in them, not IPV6

Hi there. Yes the IPV6 address has been filled up. Is just that I did not display the address to the public.

mxgrey commented 3 years ago

One detail that jumps out to me in the configuration is <ParticipantIndex>auto</ParticipantIndex> since the pertinent error message seems to be

[building_map_server-3] 1612940156.042290 [0] building_m: Failed to find a free participant index for domain 0
...
[gzserver-13] 1612940156.138800 [0]   gzserver: Failed to find a free participant index for domain 0

But I'm saying that without any experience or expertise in configuring CycloneDDS.

destkk commented 3 years ago

One detail that jumps out to me in the configuration is <ParticipantIndex>auto</ParticipantIndex> since the pertinent error message seems to be

[building_map_server-3] 1612940156.042290 [0] building_m: Failed to find a free participant index for domain 0
...
[gzserver-13] 1612940156.138800 [0]   gzserver: Failed to find a free participant index for domain 0

But I'm saying that without any experience or expertise in configuring CycloneDDS.

My apologies. Just another question, have you tried implement fastrtps across different network then?

codebot commented 3 years ago

Yes, we have used FastRTPS across multiple network segments, as well as Cyclone. Configuring this stuff is surprisingly tricky and nuanced. These XML configuration files look innocent but they are hiding a massive amount of complexity inside the DDS implementations. As much as we'd like to say "it's not a ROS2 / RMF problem" we understand that because of the coupling between DDS and ROS2/RMF it can feel that way. But this does seem like it's a DDS configuration problem, and we would suggest getting in direct contact with the DDS vendors (CycloneDDS = ADLINK, FastDDS/FastRTPS = eProsima), since their experts solve these types of problems every day, and can help you dig into what's happening.

destkk commented 3 years ago

Yes, we have used FastRTPS across multiple network segments, as well as Cyclone. Configuring this stuff is surprisingly tricky and nuanced. These XML configuration files look innocent but they are hiding a massive amount of complexity inside the DDS implementations. As much as we'd like to say "it's not a ROS2 / RMF problem" we understand that because of the coupling between DDS and ROS2/RMF it can feel that way. But this does seem like it's a DDS configuration problem, and we would suggest getting in direct contact with the DDS vendors (CycloneDDS = ADLINK, FastDDS/FastRTPS = eProsima), since their experts solve these types of problems every day, and can help you dig into what's happening.

Hi there. May I know what is the correct configuration file to used for across different network to work with RMF?

codebot commented 3 years ago

Unfortunately this is completely situation-dependent, since there are endless variations of networking scenarios. It is not possible to provide a generic example, unfortunately. I wish it were. But in order for someone to help you create a working configuration file, you will need to precisely describe your situation. Often the best way to do this is with a cartoon diagram. This description/diagram needs to include IP addresses of all machines, interconnecting infrastructure, topology, and any other constraints (i.e. "multicast is not supported on this subnet"), etc.

txlei commented 3 years ago

@destkk limit can be raised using https://github.com/eclipse-cyclonedds/cyclonedds/blob/master/docs/manual/options.md#cycloneddsdomaindiscoverymaxautoparticipantindex

add this in your cyclonedds config file. <MaxAutoParticipantIndex>100</MaxAutoParticipantIndex>