Open rafael1193 opened 7 months ago
It should not be a bug.
There is a Result caching
in action design (Please refer to https://design.ros2.org/articles/actions.html).
Users can decide how long the goal result will be retained. However, it is not guaranteed that the goal result will be consistently retained even if it has not been accessed.
Have a node emit clock messages faster than wall clock moves (at x60)
@Barry-Xu-2018 this will make the result expired (removed) 60 times faster in the action server.
i think precisely that this situation could happen, but generating KeyError
exception is not designed behavior?
i think we can check if the key bytes(goal_uuid)
exists in _result_futures
dictionary, and, goal_handle
is also remove from _goal_handles
. probably it can print warning that says "Goal is already expired before completion", what do you think?
i think precisely that this situation could happen, but generating KeyError exception is not designed behavior?
Yes. Throwing an exception is not a correct behavior in this scenario.
i think we can check if the key bytes(goal_uuid) exists in _result_futures dictionary, and, goal_handle is also remove from _goal_handles. probably it can print warning that says "Goal is already expired before completion", what do you think?
Yes. I'll double-check the code logic to see if there are any other places that may need modification when this issue occurs.
These two parts involve asynchronous operations, so to completely avoid the issue, locking (asyncio.Lock) should be added when operating on self._result_futures
.
Thank you very much for your help.
In hindsight, some expiring timeout message was to be expected but not the action server failing. I have increased result_timeout
accordingly and the synchronization problem does not seem to appear anymore.
@Barry-Xu-2018 i am almost sure this has been fixed with https://github.com/ros2/rcl/pull/1121, what do you think?
i think the only thing missing here is to check the goal_uuid
key in _result_futures
during _execute_get_result_request
call. this could generate the key error when the result timeout is too short or simulation time like this issue report.
what do you think?
i think the only thing missing here is to check the goal_uuid key in _result_futures during _execute_get_result_request call.
I was thinking that the following patch, but i found that is not needed. because it already checks the goal_uuid
in the _goal_handles
. probably we can adjust the debug message a bit.
diff --git a/rclpy/rclpy/action/server.py b/rclpy/rclpy/action/server.py
index ce60309..06b93d4 100644
--- a/rclpy/rclpy/action/server.py
+++ b/rclpy/rclpy/action/server.py
@@ -472,8 +472,13 @@ class ActionServer(Generic[GoalT, ResultT, FeedbackT], Waitable['ServerGoalHandl
# There is an accepted goal matching the goal ID, register a callback to send the
# response as soon as it's ready
- self._result_futures[bytes(goal_uuid)].add_done_callback(
- functools.partial(self._send_result_response, request_header))
+ if bytes(goal_uuid) in self._result_futures:
+ self._result_futures[bytes(goal_uuid)].add_done_callback(
+ functools.partial(self._send_result_response, request_header))
+ else:
+ # This can happen when result_timeout is too short, or high-freq simulation time
+ self._logger.warn(
+ 'Requested result has been expired already (goal ID: {0})'.format(goal_uuid))
async def _execute_expire_goals(self, expired_goals: Tuple[GoalInfo, ...]) -> None:
for goal in expired_goals:
@Barry-Xu-2018 i am almost sure this has been fixed with ros2/rcl#1121, what do you think?
If result_timeout is still set to 0, this issue might still occur. This is because the current design doesn't ensure that the result is retrieved before it gets deleted. So it's the user's responsibility to set a reasonable result timeout.
ros2/rcl#1121 isn't aimed at this issue.
i think the only thing missing here is to check the
goal_uuid
key in_result_futures
during_execute_get_result_request
call. this could generate the key error when the result timeout is too short or simulation time like this issue report.what do you think?
Yes. This check is necessary.
Bug report
Required Info:
Steps to reproduce issue
use_sim_time
parameter asTrue
Expected behavior
Action goal result is set without problem.
Actual behavior
The action server fails because
self._result_futures[bytes(goal_uuid)].set_result(result_response)
in the ActionServer _execute_goal method raises aKeyError
exception:Additional information
This occurrence of this behavior is inconsistent, but only happens in accelerated time.
It looks like a synchronization issue with
ActionServer._execute_expire_goals
that removes the goal result future from the_result_futures
dictionary before_execute_goal
sets its result. Checkingnot self._result_futures[goal_uuid].done()
before removing the expired goal seems to mitigate the issue, but the actual source of the problem may be elsewhere.