Closed Karsten1987 closed 4 years ago
Is the problem that when the following line throws
https://github.com/ros2/ros2cli/blob/fd48f1158c992c9938c6edcef887acf0fe93efac/ros2topic/ros2topic/verb/pub.py#L141
and enters the except
block, the "partially created" publisher is not actually destroyed, but ends up dangling?
Would it work if we explicitly destroyed that "partially created" publisher (under the hood, there would be calls to rmw_create_publisher()
-> rmw_destroy_publisher()
-> rmw_create_publisher()
in that sequence)?
Under the hood at the rmw
level, creating a publisher and creating publisher events are fully decoupled (except creating the publisher must come before creating its events). At the rclcpp
and rclpy
level, they're done in one function call.
So we could also instead create publishers/subscriptions in ros2cli
/ros2topic
in some non-standard decoupled way.
At the rclcpp and rclpy level, they're done in one function call.
I am not sure I fully understand the difference between what's done in rclpy and what's applied here in the command line tool. When executing the talker
node inside the demo_nodes_py
package, the publisher within is only created once. So I am not sure why the same pattern can't be applied for ros2 topic pub
.
So we could also instead create publishers/subscriptions in ros2cli/ros2topic in some non-standard decoupled way.
That sounds like a decent way to solve this.
I don't think that the publisher is dangling here, but I don't see the point of creating an entity on the DDS / middleware level just to figure out if an orthogonal feature, the events, are compatible or not. Also, one might create some non-zero work within the create function, such as notifying other components. That might lead to strange behavior if a publisher appears, disappears and directly re-appears. Granted, in the case of the command line tool this might be rather insignificant, but still.
When executing the
talker
node inside thedemo_nodes_py
package, the publisher within is only created once. So I am not sure why the same pattern can't be applied forros2 topic pub
.
The publisher/subscription in ros2 topic pub
was updated explicitly with the ability to notify the user there exists corresponding subscriptions/publishers on the same topic that has incompatible QoS policies (which will result in no communication with those particular subscriptions/publishers). The publisher/subscription being created in the demo_nodes_py
package are not being created with that ability.
Actually, come to think of it, with https://github.com/ros2/rclcpp/pull/1051 and https://github.com/ros2/rclpy/pull/536 merged, https://github.com/ros2/ros2cli/pull/410 can just be reverted. Thoughts, @ivanpauno and @wjwwood?
Reverting #410 sounds good to me.
If the creation of the publisher failed, it should be automatically cleaned up, and the user shouldn't need to do that manually. It's worth investigating why it wasn't cleaned up correctly and fixing it too.
It's worth investigating why it wasn't cleaned up correctly and fixing it too.
I think it's the expected behavior of using Python, which doesn't manage resources like C++'s RAII, meaning it won't greedily destroy pub
when execution exits the try
scope before entering the except
scope.
I think it's the expected behavior of using Python, which doesn't manage resources like C++'s RAII, meaning it won't greedily destroy pub when execution exits the try scope before entering the except scope.
The C function should handle errors, and cleanup everything that was created before the failure.
The C functions at the rmw
layer for creating a publisher and creating publisher events (registering the callbacks) are completely decoupled, so rmw_create_publisher()
would not know that rmw_publisher_event_init()
failed and so would not clean up the rmw_publisher_t
.
As far as I know, rclpy
doesn't wrap rmw_create_publisher()
and rmw_publisher_event_init()
into a single C function at the rclpy
level; they're called individually from Python.
As far as I know, rclpy doesn't wrap rmw_create_publisher() and rmw_publisher_event_init() into a single C function at the rclpy level; they're called individually from Python.
In that case, if the second one fail when a publisher
is being created, the publisher should be destroyed.
In that case, if the second one fail when a publisher is being created, the publisher should be destroyed.
Just to be clear, there are two publishers being created. Both being fully established and "correct" - meaning they are advertised on the wire etc. It's not that the publisher fails.
I feel the try/catch should really only cover the part of the event callback and not the creation of a publisher.I might not see the complete picture here, but it's not the publisher which fails, but the _eventpublisher and thus the code should check accordingly. Especially, if both things are actually decoupled on RMW layer.
Just to be clear, there are two publishers being created. Both being fully established and "correct" - meaning they are advertised on the wire etc. It's not that the publisher fails.
Yeah, but that's an error because Node.create_publisher
doesn't handle errors correctly:
and that should be corrected (i.e. if create_publisher
is raising an exception, a publisher should NOT be created).
Summary of an offline discussion with @Karsten1987:
It would be good in the future to first check if the requested events are available in the rmw
implementation, and then create the publisher and install the callbacks.
That will avoid unnecessary creating a publisher
, and then destroying it.
That will involve API changes in rmw
and rcl
, so it should be done after Foxy
.
For now, the following two improvements can be done:
Node.create_publisher
should handle errors properly. Functions that raise an error are NOT expected to leak, and should clean up the resources that they have created before raising (not having RAII is not an excuse to avoid doing cleanup).@Karsten1987 please confirm if I have summarized correctly.
Thanks for the summary, that addresses my concerns quite well. I would stress your second bullet point though. I don't think that's an improvement, it's a legit leak and should be addressed ;-)
Okay, I agree with making the two improvements.
I'll try to finish them by the end of this week.
Hi @Karsten1987, can you confirm if https://github.com/ros2/rmw_iceoryx/issues/28 can be resolved with https://github.com/ros2/rclpy/pull/553 or with https://github.com/ros2/ros2cli/pull/496 individually? Although we are aiming to merge in both fixes, they are expected to be capable of resolving your issue with rmw_iceoryx
on their own individually.
@mm318 I can confirm that https://github.com/ros2/rclpy/pull/553 solves the problem.
As mentioned here the QOS event callback triggers a call to
rmw_create_publisher
twice for the same topic on the same node. Where this might be supported for DDS implementations, it might lead to UB for non-DDS RMWs, c.f. https://github.com/ros2/rmw_iceoryx/issues/28Is there a way to create a create incompatible QoS with creating only one publisher?