ros2 / rclpy

rclpy (ROS Client Library for Python)
Apache License 2.0
278 stars 224 forks source link

Incompatibility with calling spin_until_future_complete while in callback #1313

Open jmblixt3 opened 1 month ago

jmblixt3 commented 1 month ago

Bug report

Required Info:

Steps to reproduce issue

Utilizing demos repo with action tutorials_py and action_tutorials_cpp code Apply the following patch (its in a txt file because github issues don't like patches for some reason) patch

Then build and run the following nodes in separate terminals

ros2 run action_tutorials_py fibonacci_action_server

or

ros2 run action_tutorials_cpp fibonacci_action_server

and

ros2 run action_tutorials_py fibonacci_action_hybrid

Finally, run

ros2 action send_goal /some_action action_tutorials_interfaces/action/Fibonacci

Then run it again

ros2 action send_goal /some_action action_tutorials_interfaces/action/Fibonacci

Expected behavior

On the first execution of ros2 action send_goal we get the expected behavior of

Waiting for an action server to become available...
Sending goal:
     order: 0

Goal accepted with ID: 16988b561e414d82bc9fb1c793788156

Result:
    sequence:
- 0
- 1

Goal finished with status: SUCCEEDED

And any subsequent calls to send_goals to some_action work exactly the same only varying Goal ID

Actual behavior

Any Subsequent calls actually result in

Waiting for an action server to become available...
Sending goal:
     order: 0

^C

So the goal request is never processed by the action hybrid, after the first call.

Additional information

I initially found this when working on #1123, thinking there was a race condition within rclpy, but since the goal request is never getting received by the hybrid action. The default goal request processing is a simple return accepted leaving no place for potential race condition issues in that instance.

Barry-Xu-2018 commented 1 month ago

I suspect it's because the same executor (default executor) is being executed simultaneously on different threads.

I tried to use 2 executors (executor1 and default executor) and this issue doesn't occur.
fibonacci_action_hybrid.py.txt

jmblixt3 commented 1 month ago

Looking more closely looks like it isn't caused exclusively by calling a cpp action server and I've adjusted issue description and steps to reproduce

jmblixt3 commented 1 month ago

Looks like by doing something like this, can mitigate the issue, but feels really awkward. Here's a patch implementing this example.txt. Maybe this is just a fundamental limitation of nodes, but using different callback groups should be sufficient in my eyes anyway.

Barry-Xu-2018 commented 1 month ago

I suspect it's because the same executor (default executor) is being executed simultaneously on different threads.

This guess is incorrect. The issue is likely related to using the default SingleThreadedExecutor. I can't explain the root cause yet (It will take some time to pinpoint the cause.). I just tried switching to MultiThreadedExecutor and found that the problem didn't occur.
fibonacci_action_hybrid.py.txt

Barry-Xu-2018 commented 1 month ago

@jmblixt3

The root cause is related to rclpy.spin_until_future_complete().

rclpy.spin(node) is to add node to default executor.

The implementation of rclpy.spin_until_future_complete() is https://github.com/ros2/rclpy/blob/43198cbfe11c84480b15478f76d5fa3e998e5039/rclpy/rclpy/__init__.py#L324-L329

The last step is to remove node. After this step, no node is in executor. So no processing will be called at all (rclpy.spin()).

Change code as below

...
#rclpy.spin_until_future_complete(self,request_future)
rclpy.get_global_executor().spin_until_future_complete(request_future)
...
#rclpy.spin_until_future_complete(self,result_future)
rclpy.get_global_executor().spin_until_future_complete(result_future)
jmblixt3 commented 1 month ago

Would it be reasonable to check before adding the node if it is already in executor. If already present not remove it after completion?

Barry-Xu-2018 commented 1 month ago

Would it be reasonable to check before adding the node if it is already in executor. If already present not remove it after completion?

There won't be any duplicate Nodes added. Refer to line 271 https://github.com/ros2/rclpy/blob/43198cbfe11c84480b15478f76d5fa3e998e5039/rclpy/rclpy/executors.py#L263-L277

jmblixt3 commented 1 month ago

I understand it's not adding a duplicate. I was wondering if it would be appropriate to have the node not be removed from executor at the end if it was already spinning.

Barry-Xu-2018 commented 1 month ago

I was wondering if it would be appropriate to have the node not be removed from executor at the end if it was already spinning.

Spinning is the executor's responsibility. While spinning, the Nodes added to the executor can be changed.

The executor will recollect the entities from added nodes at an appropriate point.
Recollecting entities will occur after handling all the ready tasks returned by wait_set.wait().

https://github.com/ros2/rclpy/blob/43198cbfe11c84480b15478f76d5fa3e998e5039/rclpy/rclpy/executors.py#L567-L568

Barry-Xu-2018 commented 1 month ago

I saw your PR #1316 and I understand your idea. I misunderstood earlier.