mvukov / rules_ros2

Build ROS 2 with Bazel
Apache License 2.0
81 stars 45 forks source link

Unable to communicate talker and listener each other in the chatter example #177

Closed aochiai closed 6 months ago

aochiai commented 1 year ago

Hi, I'm struggling with some weird behaviors of rclcpp publisher / subscriber.

In my environment, talker and listener in the chatter example built with rules_ros can't communicate each other. A message published by the talker is not reached to the chatter. Regarding the trace log of CycloneDDS, it seems that the chatter can't discovery the talker correctly. But they work as expected if I transplant them to a new ament/cmake package and build withcolcon. I also confirmed that ros2 topic command can only communicate with the colcon-built version.

In addition to the behavior above, I'm also experiencing a behavior that subscriber can't subscribe topics published by external ROS packages with non-ROS-common message types. Let's say that an external package publishes a topic with the message type Foo. I copied the IDL and buid it with ros2_interface_librarary and friends with rules_ros. In this case the talker with rules_ros2 can't communicate with listener with colcon.

I investigated this problem by collecting Debug logs of rclcpp and traces of CycloneDDS but couldn't find a clear solution. Any hints or pointers would be welcome. Thanks.

My environment is as follows:

mvukov commented 1 year ago

Hi,please make a repo with a minimal example that can be reproduced. Perhaps put together a Dockerfile that creates a min reproducible env for your issue. FWIW, rules_ros2 doesn't need any of system ROS2.

aochiai commented 1 year ago

Hi, thank you for responding.

I used the rules_ros2/examples/chatter without any changes. I also ported it to colcon as this repo. Please create a colcon workspace and put the above repo to src/ then build.

aochiai commented 1 year ago

I'm trying to reproduce it within a Docker container. Please wait for a while.

mvukov commented 11 months ago

How's going with your investigation?

samehmohamed88 commented 5 months ago

@mvukov I am actually experiencing the same thing. The issue is only reproducible for me inside the docker container. Meaning if I am on native Ubuntu 22.04, communication across two independently started nodes works fine. It's only when inside a docker container that this communication breaks down.

My container is an Ubuntu 22.04, that is started with privileged property and host networking.

I only use the docker for development only so I am not actually blocked by this. But it does prevent me from testing some things out.

I apologize that I don't have enough time this week to dig into it anymore than this.

mvukov commented 5 months ago

Hi, so, how can I reproduce this? Please make a repo with a minimal example that can be reproduced. Perhaps put together a Dockerfile that creates a min reproducible env for your issue.

ahans commented 5 months ago

This issue is probably not rules_ros2-specific, but has to do with CycloneDDS. By default, it uses a single network interface that is determined automatically. Maybe this behaves differently in the container vs native on the host. I was not able to repro the issue. Both inside the container as well as between container and host, nodes communicate just fine. But that depends on the availability of network interfaces. In my experience, having a single ethernet device available (that is also up) works most reliably. Without special configuration, I was not able to make CycloneDDS work using localhost. That is because localhost is reported to not support multicast under Linux, although using it for multicast works just fine.

You can experiment with the following configuration file:

<?xml version="1.0" encoding="UTF-8" ?>
<CycloneDDS xmlns="https://cdds.io/config" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="https://cdds.io/config
https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/master/etc/cyclonedds.xsd">
    <Domain id="any">
        <General>
            <Interfaces>
                <NetworkInterface name="lo" multicast="true" />
            </Interfaces>
        </General>
    </Domain>
</CycloneDDS>

Save that to a file cyclone_dds_localhost.xml (or whatever) and have CYCLONEDDS_URI point to it:

export CYCLONEDDDS_URI=/absolute/path/to/cyclone_dds_localhost.xml

Then CycloneDDS will pick up that config and communicate only using localhost. With multicast="true" we override whatever the OS reports and tell CycloneDDS to assume that multicast is supported. With that config, I was able to have a node in a container (started with --network host) communicate with a node on the host.