ros2 / rclcpp

rclcpp (ROS Client Library for C++)
Apache License 2.0
562 stars 425 forks source link

Waitset not triggering timer when initialized in lifecycle node #2652

Open luca-della-vedova opened 1 month ago

luca-della-vedova commented 1 month ago

Bug report

Required Info:

Steps to reproduce issue

ros2 lifecycle set /minimal_timer configure
ros2 lifecycle set /minimal_timer activate
ros2 run examples_rclcpp_minimal_service service_main

Expected behavior

Since the service callback destroys only timer_, timer2_ should keep running and Hello, world! should keep printing.

Actual behavior

All timers are stopped and no Hello, world! is printed.

Additional information

The following might shed some light into this.

So I suspect there is something about how timers (and maybe other entities?) are added to wait sets inside lifecycle node transitions that creates this issue.

mjcarroll commented 1 month ago

This definitely sounds like a bug.

If I manually add a || true here making this condition always evaluate to true,

This is going to force the waitset to always be rebuilt, which is not good for performance. It's likely that the bug is somewhere else (in how timers are added?)

I probably won't get a chance to look at this until after roscon, but thank you for the full repro example and detailed steps.

alsora commented 1 month ago

Yes, I agree that this sounds like a bug.

So I suspect there is something about how timers (and maybe other entities?) are added to wait sets inside lifecycle node transitions that creates this issue.

I wouldn't exclude that this is "timers-specific" since timers are handled in a slightly different ways from other entities. Moreover, lifecycle nodes do not do anything special here, so the same problem should be reproducible in a regular node (executors don't really care about nodes, they only care about callback groups)

mjcarroll commented 1 month ago

lifecycle nodes do not do anything special here

My assumption as well is that something about the execution order from lifecycle nodes is surfacing this, but it's not something that would be unique to lifecycle nodes.

fmrico commented 1 month ago

Hi,

I discovered this bug five days ago at https://github.com/fmrico/cascade_lifecycle/pull/14, and now I am solving a problem that is directly related at PlanSys2 (https://github.com/ros/rosdistro/pull/43212#issuecomment-2418582680). I am solving this now by changing the SingleThreadedExecutors to EventsExecutors in some cases and removing the node before destroying it in other cases.

Basically, if I have two nodes in an executor, each one with a timer, if I destroy one of the nodes, the timer callback in the other node is no longer called.

I can investigate more if you want.