ros2 / rmw_cyclonedds

ROS 2 RMW layer for Eclipse Cyclone DDS
Apache License 2.0
117 stars 90 forks source link

Previously configured peer never gets undiscovered, even if removed from the peer list. #520

Open OscarMrZ opened 3 weeks ago

OscarMrZ commented 3 weeks ago

Bug report

Required Info:

Steps to reproduce issue

I'm configuring unicast discovery, trying to replicate the behavior achieved in rolling with the new env variables.

This is the config in my pc:

<CycloneDDS>
    <Domain>
        <General>
            <AllowMulticast>false</AllowMulticast>
            <MaxMessageSize>65500B</MaxMessageSize>
        </General>
        <Discovery>
            <ParticipantIndex>auto</ParticipantIndex>
            <Peers>
                <Peer Address="localhost"/>
                <Peer Adddress="robot-hostname"/>
            </Peers>
            <MaxAutoParticipantIndex>500</MaxAutoParticipantIndex>
        </Discovery>
        <Internal>
            <SocketReceiveBufferSize min="10MB"/>
            <Watermarks>
                <WhcHigh>500kB</WhcHigh>
            </Watermarks>
        </Internal>
    </Domain>
</CycloneDDS>

And this is the config on my robot

<CycloneDDS>
    <Domain>
        <General>
            <AllowMulticast>false</AllowMulticast>
            <MaxMessageSize>65500B</MaxMessageSize>
        </General>
        <Discovery>
            <ParticipantIndex>auto</ParticipantIndex>
            <Peers>
                <Peer Address="localhost"/>
            </Peers>
            <MaxAutoParticipantIndex>500</MaxAutoParticipantIndex>
        </Discovery>
        <Internal>
            <SocketReceiveBufferSize min="10MB"/>
            <Watermarks>
                <WhcHigh>500kB</WhcHigh>
            </Watermarks>
        </Internal>
    </Domain>
</CycloneDDS>

To the best of my knowledge, this would be analogous to Peer A (my pc) configured with localhost and a static peer in the list and Peer B configured with localhost and no static peers on the list.

After a clean restart of everything ROS 2 related, I publish a simple test message from the robot:

ros2 topic pub /hellostd_msgs/msg/String "data: 'hello'"

and in my pc

ros2 topic list

And as expected, after a little while, I can properly see the topic from the robot in the pc and can confirm all the traffic is unicast. After that, I would like to disconnect from the robot (removing it from the peer list), and stop receiving messages from it in Peer A and also stop seeing its topics. In order to do that, I terminate all the ROS 2 process (SIGTERM) and change the cyclone config in the pc, only specifying now the localhost peer.

Expected behavior

The first robot (Peer B) stops sending messages to my pc after the lease duration time specified in the SPDP message has expired.. This would also mean that my pc shouldn't be aware of the topics in Peer B after removing it from the list of peers. I shouldn't need to restart Peer B, which is a robot that shouldn't care about if I am or not connected.

Actual behavior

Peer B continuously sends INFO_TS messages to my PC, never undiscovering it. This makes my pc discover Peer B again, seeing all its topics even if it is not in the peer list. Only killing all ros2 processes on the robot stops this behavior and achieves what I expect.

Additional information

You can find a traffic capture demonstrating this behavior. Please let me know if this may be due to some misunderstanding by me side about the undiscovery process or if I am missing some configuration parameters. Thank you very much!

example_compressed.pcapng.gz

mjcarroll commented 2 weeks ago

@eboasson this sounds like a Cyclone specific configuration rather than something at the RMW layer, do you mind to take a look?

OscarMrZ commented 1 day ago

To add a little bit more of info, I'm receiving a INFO_TS message from the ros2daemon and from my publisher, concretely every 8 seconds (which happens to be the heartbeat interval). However, this happens even with best effort QoS and to the best of my knowledge this should not be the case.

@mjcarroll I opened it here because I'm not sure if this cyclone specific or a problem or the rmw implementation, do you think it should be closed and reopened elsewhere?