Closed svwilliams closed 6 months ago
Looks like you're not the only one:
https://github.com/ros-controls/ros2_controllers/issues/1101
Slightly different use case, but the net result is the same. Looking at other issues linked there, I see a comment from William Woodall:
https://github.com/ros2/rclcpp/pull/2142#issuecomment-2078297424
It feels like there has to be some kind of dangling reference to the sub somewhere. You are using this overload:
The sub
should go out of scope, but if something else is keeping a copy of the shared pointer, then it will obviously persist.
Your solution is obviously find, but I wonder if you used the other overload, which forces you to create the sub yourself, then you could manually call reset()
on the sub shared pointer within the loop every time. Shouldn't be necessary, but maybe it would work? If you did use that overload, you could also call use_count
on the sub pointer to see if someone is maintaining a reference to it somewhere.
Maybe? It's hard to know if that test is throwing a fault on the first call to wait_for_message
or a subsequent call. It's in a loop, so I am wondering if it's the second call that results in the error.
Just for fun I added some tracing print statements.
It looks like the error throws on the very first call to wait_for_message()
.
[test_fixed_lag_ignition-1] Waiting for message (0)...
[test_fixed_lag_ignition-1] terminate called after throwing an instance of 'std::runtime_error'
[test_fixed_lag_ignition-1] what(): subscription already associated with a wait set
[test_fixed_lag_ignition-1] Aborted (core dumped)
But as Tom/Carlos suggest:
automatically_add_to_executor_with_node
parameter set to false (which is undocumented)
while ((odom_msg.header.stamp != rclcpp::Time(3, 0,
RCL_ROS_TIME)) && (node->now() < result_timeout))
{
rclcpp::SubscriptionOptions opts;
opts.callback_group =
node->create_callback_group(rclcpp::CallbackGroupType::MutuallyExclusive, false);
auto sub = node->create_subscription<nav_msgs::msg::Odometry>(
"/odom", 1, [](const std::shared_ptr<const nav_msgs::msg::Odometry>) {}, opts);
rclcpp::wait_for_message(odom_msg, sub, node->get_node_options().context(),
std::chrono::seconds(1));
}
This does work.
However, now that I see what is happening under the hood, I'm not in love with the loop creating a subscriber every time. That seems like it could lead to missed/dropped messages? If the external node published a message at just the wrong time?
I think I'm going to leave the unit test as-is -- create a normal subscriber and wait for the expected message. This feels like it requires far less explanation for how the test works.
But I did realize I introduced a threading problem. The last commit adds a mutex to guard access to the stored odometry message.
Checking if adding a comment will trigger the upstream CI build to run
I don't understand. The last PR magically triggered CI jobs after I added a comment. This one refuses. Merging anyway.
Fix fuse optimizer unit test. The
rclcpp::wait_for_message
call was throwing an exception 'subscription already associated with a wait set'. I have no idea why. My instinct is that the node in this unit test is using a single-threaded executor with a dedicated spin thread, and that maybe therclcpp::wait_for_message
is also trying to spin that same subscriber? :shrug:I switched the test to use a normal subscriber + callback function. It's not the prettiest thing, but it's not terrible either. I also needed to add a delay between publishing
pose_msg1
andpose_msg2
. Again, I have no idea why this is needed. I have the publisher queue set to 5, and the pose subscriber has a default queue size of 10. I'm not sure why the first message is getting lost.