ros2 / rmw_cyclonedds

ROS 2 RMW layer for Eclipse Cyclone DDS
Apache License 2.0
112 stars 91 forks source link

Lifespan not working with transiet_local subscriber #473

Open charlielito opened 1 year ago

charlielito commented 1 year ago

Bug report

When subscribing to a transient_local published topic and using the Lifespan parameter, it seems that it doesn't have any effect on the subscriber's side. That means the callback of the subscriber is always called with the message ignoring the Lifespan parameter you use.

Required Info:

Steps to reproduce issue

To reproduce the issue we have 2 nodes: one that publishes to /some_topic with transient_local and infinite lifespan QoS and the other node that subscribes to that topic also with transient_local QoS but with a lifespan of 2 seconds.

Run this publisher node in one Terminal

import rclpy
from rclpy.node import Node
from rclpy.qos import (
    QoSDurabilityPolicy,
    QoSProfile,
    QoSReliabilityPolicy,
)

from std_msgs.msg import Bool

class PubNode(Node):
    def __init__(self):
        super().__init__("pub_node")

        qos = QoSProfile(
            depth=5,
            durability=QoSDurabilityPolicy.TRANSIENT_LOCAL,
            reliability=QoSReliabilityPolicy.RELIABLE,
        )

        self.pub = self.create_publisher(Bool, "/some_topic", qos)
        self.pub.publish(Bool(data=False))

def main(args=None):
    rclpy.init(args=args)
    node = PubNode()

    try:
        rclpy.spin(node)
    except KeyboardInterrupt:
        pass

    node.destroy_node()
    rclpy.shutdown()

if __name__ == "__main__":
    main()

Then run the subscriber node in another terminal

from rclpy.node import Node
from rclpy.qos import (
    QoSDurabilityPolicy,
    QoSProfile,
    QoSReliabilityPolicy,
)
from std_msgs.msg import Bool
from rclpy.duration import Duration
import rclpy

class SubNode(Node):
    def __init__(self):
        super().__init__("sub_node")

        qos = QoSProfile(
            depth=5,
            durability=QoSDurabilityPolicy.TRANSIENT_LOCAL,
            reliability=QoSReliabilityPolicy.RELIABLE,
            lifespan=Duration(seconds=2),
        )
        print(qos)
        self.sub = self.create_subscription(
            Bool,
            "/some_topic",
            self.my_callback,
            qos,
        )

    def my_callback(self, msg: Bool):
        self.get_logger().info("Received: {}".format(msg.data))

def main(args=None):
    rclpy.init(args=args)
    node = SubNode()

    try:
        rclpy.spin(node)
    except KeyboardInterrupt:
        pass

    node.destroy_node()
    rclpy.shutdown()

if __name__ == "__main__":
    main()

Expected behavior

Because of the lifespan of 2 seconds, the callback shouldn't be triggered if the node is launched after 2 seconds of the publisher node

Actual behavior

The callback is always triggered

Additional information

In the subscriber node, I printed the QoS object to check the properties. I got this:

QoSProfile(history=HistoryPolicy.KEEP_LAST, depth=5, reliability=ReliabilityPolicy.RELIABLE, durability=DurabilityPolicy.TRANSIENT_LOCAL, lifespan=2000000000 nanoseconds, deadline=0 nanoseconds, liveliness=LivelinessPolicy.SYSTEM_DEFAULT, liveliness_lease_duration=0 nanoseconds, avoid_ros_namespace_conventions=False)

But when checking the info with ros2 topic info /some_topic --verbose I got:

Node name: sub_node
Node namespace: /
Topic type: std_msgs/msg/Bool
Topic type hash: RIHS01_feb91e995ff9ebd09c0cb3d2aed18b11077585839fb5db80193b62d74528f6c9
Endpoint type: SUBSCRIPTION
GID: 01.10.a5.80.f1.17.ec.a8.06.91.a8.ae.00.00.14.04
QoS profile:
  Reliability: RELIABLE
  History (Depth): KEEP_LAST (5)
  Durability: TRANSIENT_LOCAL
  Lifespan: Infinite
  Deadline: Infinite
  Liveliness: AUTOMATIC
  Liveliness lease duration: Infinite

which says the Lifespan is Infinite. Somehow the QoS is not being set correctly (?)

fujitatomoya commented 12 months ago

@charlielito

DDS implementation: default

That is supposed to mean you use rmw_fastrtps_cpp with iron, can you confirm?

i think, some rmw impelementation cannot show the actual QoS setting via ros2 topic info -v, this depends on rmw implementation. for example, rmw_fastrtps_cpp cannot show the history and depth QoS configuration.

there's been discussion on this, see more details for https://github.com/ros2/rclpy/pull/849#issuecomment-988548722

this is the conversion function RTPS QoS to RMW QoS, describes above.

https://github.com/ros2/rmw_fastrtps/blob/8eeb955f581c9cd24761185557ba7f46ef6e1d4c/rmw_fastrtps_shared_cpp/include/rmw_fastrtps_shared_cpp/qos.hpp#L128-L130

besides, here is what i see.

root@tomoyafujita:~/ros2_ws/colcon_ws# ros2 run prover_rclpy rclpy_1163_pub
...

root@tomoyafujita:~/ros2_ws/colcon_ws# ros2 run prover_rclpy rclpy_1163_sub
...

root@tomoyafujita:~/ros2_ws/colcon_ws# ros2 topic info -v /some_topic
Type: std_msgs/msg/Bool

Publisher count: 1

Node name: pub_node
Node namespace: /
Topic type: std_msgs/msg/Bool
Topic type hash: RIHS01_feb91e995ff9ebd09c0cb3d2aed18b11077585839fb5db80193b62d74528f6c9
Endpoint type: PUBLISHER
GID: 01.0f.60.db.44.f7.83.bd.00.00.00.00.00.00.13.03
QoS profile:
  Reliability: RELIABLE
  History (Depth): UNKNOWN
  Durability: TRANSIENT_LOCAL
  Lifespan: Infinite
  Deadline: Infinite
  Liveliness: AUTOMATIC
  Liveliness lease duration: Infinite

Subscription count: 1

Node name: sub_node
Node namespace: /
Topic type: std_msgs/msg/Bool
Topic type hash: RIHS01_feb91e995ff9ebd09c0cb3d2aed18b11077585839fb5db80193b62d74528f6c9
Endpoint type: SUBSCRIPTION
GID: 01.0f.60.db.18.f8.bd.4f.00.00.00.00.00.00.13.04
QoS profile:
  Reliability: RELIABLE
  History (Depth): UNKNOWN
  Durability: TRANSIENT_LOCAL
  Lifespan: 2000000000 nanoseconds
  Deadline: Infinite
  Liveliness: AUTOMATIC
  Liveliness lease duration: Infinite

thanks,

charlielito commented 12 months ago

@fujitatomoya my bad, actually, I was using Cyclone DDS export RMW_IMPLEMENTATION=rmw_cyclonedds_cpp After changing it to rmw_fastrtps_cpp I can confirm your results and the expected behaviour. In that case what should we do with this issue? Who can tackle this in cyclone DDS implementation?

clalancette commented 12 months ago

I'm going to move this over to the rmw_cyclonedds_cpp repository for now.

eboasson commented 11 months ago

This is very closely related to https://github.com/eclipse-cyclonedds/cyclonedds/issues/1835

This issue has an interesting difference with that one, though: the lifespan in DDS is spec'd to use the time of publication of the data (the "source timestamp") and so is probably never going to work in a useful way with transient-local data because the historical data would probably have expired long before it reached the reader. Therefore, simply adding a notion of lifespan to reader (as I noted as a possible way of addressing https://github.com/eclipse-cyclonedds/cyclonedds/issues/1835) is not going to improve things here ...

So how badly needed is it? If it is not really important, then a documentation update is a reasonable approach, else we need to figure out how to deal with it without breaking the spec entirely.

charlielito commented 10 months ago

IMO this should be fixed since the behavior should be the same independently of the DDS implementation.

eboasson commented 10 months ago

IMO this should be fixed since the behavior should be the same independently of the DDS implementation.

Are you sure that's what you mean? Because that's equivalent to saying the combination is not supported because it is impossible without using vendor-specific extensions to the DDS specification.

charlielito commented 10 months ago

I mean that it is very confusing that it behaves differently without any warning because the user would expect the same behavior. I think the feature is also a reasonable one since lifespan is an important part of the QoS specs