ros2 / ros2cli

ROS 2 command line interface tools
Apache License 2.0
183 stars 164 forks source link

ros2 topic pub creates two publishers #500

Closed Karsten1987 closed 4 years ago

Karsten1987 commented 4 years ago

As mentioned here the QOS event callback triggers a call to rmw_create_publisher twice for the same topic on the same node. Where this might be supported for DDS implementations, it might lead to UB for non-DDS RMWs, c.f. https://github.com/ros2/rmw_iceoryx/issues/28

Is there a way to create a create incompatible QoS with creating only one publisher?

mm318 commented 4 years ago

Is the problem that when the following line throws https://github.com/ros2/ros2cli/blob/fd48f1158c992c9938c6edcef887acf0fe93efac/ros2topic/ros2topic/verb/pub.py#L141 and enters the except block, the "partially created" publisher is not actually destroyed, but ends up dangling?

Would it work if we explicitly destroyed that "partially created" publisher (under the hood, there would be calls to rmw_create_publisher() -> rmw_destroy_publisher() -> rmw_create_publisher() in that sequence)?

mm318 commented 4 years ago

Under the hood at the rmw level, creating a publisher and creating publisher events are fully decoupled (except creating the publisher must come before creating its events). At the rclcpp and rclpy level, they're done in one function call.

So we could also instead create publishers/subscriptions in ros2cli/ros2topic in some non-standard decoupled way.

Karsten1987 commented 4 years ago

At the rclcpp and rclpy level, they're done in one function call.

I am not sure I fully understand the difference between what's done in rclpy and what's applied here in the command line tool. When executing the talker node inside the demo_nodes_py package, the publisher within is only created once. So I am not sure why the same pattern can't be applied for ros2 topic pub.

So we could also instead create publishers/subscriptions in ros2cli/ros2topic in some non-standard decoupled way.

That sounds like a decent way to solve this.

I don't think that the publisher is dangling here, but I don't see the point of creating an entity on the DDS / middleware level just to figure out if an orthogonal feature, the events, are compatible or not. Also, one might create some non-zero work within the create function, such as notifying other components. That might lead to strange behavior if a publisher appears, disappears and directly re-appears. Granted, in the case of the command line tool this might be rather insignificant, but still.

mm318 commented 4 years ago

When executing the talker node inside the demo_nodes_py package, the publisher within is only created once. So I am not sure why the same pattern can't be applied for ros2 topic pub.

The publisher/subscription in ros2 topic pub was updated explicitly with the ability to notify the user there exists corresponding subscriptions/publishers on the same topic that has incompatible QoS policies (which will result in no communication with those particular subscriptions/publishers). The publisher/subscription being created in the demo_nodes_py package are not being created with that ability.

mm318 commented 4 years ago

Actually, come to think of it, with https://github.com/ros2/rclcpp/pull/1051 and https://github.com/ros2/rclpy/pull/536 merged, https://github.com/ros2/ros2cli/pull/410 can just be reverted. Thoughts, @ivanpauno and @wjwwood?

ivanpauno commented 4 years ago

Reverting #410 sounds good to me.

If the creation of the publisher failed, it should be automatically cleaned up, and the user shouldn't need to do that manually. It's worth investigating why it wasn't cleaned up correctly and fixing it too.

mm318 commented 4 years ago

It's worth investigating why it wasn't cleaned up correctly and fixing it too.

I think it's the expected behavior of using Python, which doesn't manage resources like C++'s RAII, meaning it won't greedily destroy pub when execution exits the try scope before entering the except scope.

ivanpauno commented 4 years ago

I think it's the expected behavior of using Python, which doesn't manage resources like C++'s RAII, meaning it won't greedily destroy pub when execution exits the try scope before entering the except scope.

The C function should handle errors, and cleanup everything that was created before the failure.

mm318 commented 4 years ago

The C functions at the rmw layer for creating a publisher and creating publisher events (registering the callbacks) are completely decoupled, so rmw_create_publisher() would not know that rmw_publisher_event_init() failed and so would not clean up the rmw_publisher_t.

As far as I know, rclpy doesn't wrap rmw_create_publisher() and rmw_publisher_event_init() into a single C function at the rclpy level; they're called individually from Python.

ivanpauno commented 4 years ago

As far as I know, rclpy doesn't wrap rmw_create_publisher() and rmw_publisher_event_init() into a single C function at the rclpy level; they're called individually from Python.

In that case, if the second one fail when a publisher is being created, the publisher should be destroyed.

Karsten1987 commented 4 years ago

In that case, if the second one fail when a publisher is being created, the publisher should be destroyed.

Just to be clear, there are two publishers being created. Both being fully established and "correct" - meaning they are advertised on the wire etc. It's not that the publisher fails.

I feel the try/catch should really only cover the part of the event callback and not the creation of a publisher.I might not see the complete picture here, but it's not the publisher which fails, but the _eventpublisher and thus the code should check accordingly. Especially, if both things are actually decoupled on RMW layer.

ivanpauno commented 4 years ago

Just to be clear, there are two publishers being created. Both being fully established and "correct" - meaning they are advertised on the wire etc. It's not that the publisher fails.

Yeah, but that's an error because Node.create_publisher doesn't handle errors correctly:

https://github.com/ros2/ros2cli/blob/fd48f1158c992c9938c6edcef887acf0fe93efac/ros2topic/ros2topic/verb/pub.py#L140-L144

and that should be corrected (i.e. if create_publisher is raising an exception, a publisher should NOT be created).

ivanpauno commented 4 years ago

Summary of an offline discussion with @Karsten1987:

It would be good in the future to first check if the requested events are available in the rmw implementation, and then create the publisher and install the callbacks. That will avoid unnecessary creating a publisher, and then destroying it.

That will involve API changes in rmw and rcl, so it should be done after Foxy.

For now, the following two improvements can be done:

@Karsten1987 please confirm if I have summarized correctly.

Karsten1987 commented 4 years ago

Thanks for the summary, that addresses my concerns quite well. I would stress your second bullet point though. I don't think that's an improvement, it's a legit leak and should be addressed ;-)

mm318 commented 4 years ago

Okay, I agree with making the two improvements.

I'll try to finish them by the end of this week.

mm318 commented 4 years ago

Hi @Karsten1987, can you confirm if https://github.com/ros2/rmw_iceoryx/issues/28 can be resolved with https://github.com/ros2/rclpy/pull/553 or with https://github.com/ros2/ros2cli/pull/496 individually? Although we are aiming to merge in both fixes, they are expected to be capable of resolving your issue with rmw_iceoryx on their own individually.

Karsten1987 commented 4 years ago

@mm318 I can confirm that https://github.com/ros2/rclpy/pull/553 solves the problem.