ros2 / rclpy

rclpy (ROS Client Library for Python)
Apache License 2.0
268 stars 221 forks source link

Fixes Action.*_async futures never complete #1308

Open jmblixt3 opened 3 days ago

jmblixt3 commented 3 days ago

Replaces #1125 since original contributor went inactive

Closes #1123

If two separate client server actions are running in separate executors the future given to the ActionClient will never complete due to a race condition on the rcl handles

Tested using this from @apockill which was adapted to match rolling then test with and without locks

To reproduce initial issue remove the lock/sleep in the RaceyAction client

Adapted example Code here
**client.py** ```python import rclpy from rclpy import Future from rclpy.action import ActionClient from rclpy.callback_groups import MutuallyExclusiveCallbackGroup from rclpy.executors import MultiThreadedExecutor from rclpy.node import Node from test_msgs.action import Fibonacci as SomeMsg from time import sleep class RaceyActionClient(ActionClient): def _get_result_async(self, goal_handle): """ Request the result for an active goal asynchronously. :param goal_handle: Handle to the goal to cancel. :type goal_handle: :class:`ClientGoalHandle` :return: a Future instance that completes when the get result request has been processed. :rtype: :class:`rclpy.task.Future` instance """ if not isinstance(goal_handle, ClientGoalHandle): raise TypeError( 'Expected type ClientGoalHandle but received {}'.format(type(goal_handle))) result_request = self._action_type.Impl.GetResultService.Request() result_request.goal_id = goal_handle.goal_id future = Future() with self._lock: sleep(1) sequence_number = self._client_handle.send_result_request(result_request) if sequence_number in self._pending_result_requests: raise RuntimeError( 'Sequence ({}) conflicts with pending result request'.format(sequence_number)) sleep(1) self._pending_result_requests[sequence_number] = future sleep(1) self._result_sequence_number_to_goal_id[sequence_number] = result_request.goal_id sleep(1) future.add_done_callback(self._remove_pending_result_request) sleep(1) # Add future so executor is aware self.add_future(future) return future class SomeActionClient(Node): def __init__(self): super().__init__("action_client") self._action_client = RaceyActionClient( self, SomeMsg, "some_action", callback_group=MutuallyExclusiveCallbackGroup(), ) self.call_action_timer = self.create_timer( timer_period_sec=10, callback=self.send_goal, callback_group=MutuallyExclusiveCallbackGroup(), ) def send_goal(self): goal_msg = SomeMsg.Goal() # self._action_client.wait_for_server() result = self._action_client.send_goal(goal_msg) print(result) def main(args=None): rclpy.init(args=args) action_client = SomeActionClient() executor = MultiThreadedExecutor() rclpy.spin(action_client, executor) if __name__ == "__main__": main() ``` **server.py** ```python import rclpy from rclpy.action import ActionServer from rclpy.node import Node from test_msgs.action import Fibonacci as SomeMsg class SomeActionServer(Node): def __init__(self): super().__init__("fibonacci_action_server") self._action_server = ActionServer( self, SomeMsg, "some_action", self.execute_callback ) def execute_callback(self, goal_handle): self.get_logger().info("Executing goal...") result = SomeMsg.Result() return result def main(args=None): rclpy.init(args=args) action_server = SomeActionServer() rclpy.spin(action_server) if __name__ == "__main__": main() ```
Before and after log results
**Before** client output ``` ^C ``` server output ``` [INFO] [1719525385.717267202] [fibonacci_action_server]: Executing goal... [WARN] [1719525385.718010542] [fibonacci_action_server.action_server]: Goal state not set, assuming aborted. Goal ID: [159 252 58 99 54 112 68 36 153 158 127 113 184 116 59 16] ^C ``` **After** client output ``` [INFO] [1719525608.692803958] [fibonacci_action_server]: Executing goal... [WARN] [1719525608.693434438] [fibonacci_action_server.action_server]: Goal state not set, assuming aborted. Goal ID: [237 233 91 193 44 134 67 64 132 12 50 224 217 180 42 44] [INFO] [1719525638.104589089] [fibonacci_action_server]: Executing goal... [WARN] [1719525638.105044519] [fibonacci_action_server.action_server]: Goal state not set, assuming aborted. Goal ID: [ 79 65 88 239 113 117 68 85 160 143 252 189 69 104 34 138] [INFO] [1719525648.104883887] [fibonacci_action_server]: Executing goal... [WARN] [1719525648.105726567] [fibonacci_action_server.action_server]: Goal state not set, assuming aborted. Goal ID: [ 88 208 33 171 9 53 70 30 178 102 119 74 28 22 199 124] ^C ``` ``` test_msgs.action.Fibonacci_GetResult_Response(status=6, result=test_msgs.action.Fibonacci_Result(sequence=[])) test_msgs.action.Fibonacci_GetResult_Response(status=6, result=test_msgs.action.Fibonacci_Result(sequence=[])) test_msgs.action.Fibonacci_GetResult_Response(status=6, result=test_msgs.action.Fibonacci_Result(sequence=[])) ^C ```

As you can see the warn from action server is still present, but the client no longer deadlocks, fixing the initial issue. Ideally that would be fixed as well, but I couldn't determine a solution to that.

I also initially tried to add locks to just the action client, but I was getting test failures in test_action_graph on the get_names_and_types function, so I added for action server as well.

jmblixt3 commented 2 days ago

i would add @aditya2592 as co-author, since this borrows the code from #1125.

Done