Closed clalancette closed 4 years ago
My understanding is that even if the goal is terminated clients should still be able to get the result (see design doc on result caching). Whether this is actually implemented properly, I'm not sure. So I would think that this is a valid (though perhaps rare) use-case.
After investigate this bug, the root cause is that action client call 'async_get_result()' after the goal status has been changed to SUCCESSED. https://github.com/ros2/rclcpp/blob/fd3655c26cc363c128402eaab7f563bdd17a3b7c/rclcpp_action/include/rclcpp_action/client.hpp#L565-L589
While status of goal is changed to 'SUCCESSED', goalhandles will remove this goal ID. And then if calling async_get_result()
At 406 line, goalhandles.count(goal_handle->get_goal_id()) return 0 since there is no goal ID in goalhandles.
So you get error 'Goal handle is not known to this client'.
In above test codes, send goal result in goal_accepted callback function 'acceptedCallback()'. But goal_accepted callback is called while action service receive goal_request from action client. Author consider this problem, so he adds time wait (line 71). But this wait time cannot be suitable for all platform.
About solution, there is 3 solution.
Simple solution. We can extend the wait time in test_buffer_client.cpp.
Add new API for action client. (Not suggestion. ) While receiving goal status 'SUCCESSED', only remove goal ID in goalhandles. At this time, this goal_handle also exist. Can add new API to get the result of goal.
Like @jacobperron said, this goal result should be able in action server as design description.
Result caching
The server should cache the result once it is ready so multiple clients have to opportunity to get it.
This is also useful for debugging/introspection tools.
Based on current codes of action server, goal result will be cached for 15 minutes (by default). But there are no interfaces (server side and client side) to get this goal result again once the status of goal is SUCCESSED. So solution 3 is to implement these interfaces. But modification is big.
@Barry-Xu-2018
goal result will be cached for 15 minutes (by default)
this is cached in rcl action layer, right. but there is no way to get it from rclcpp. and also this is configurable on rcl via result_timeout.
Like @jacobperron said, this goal result should be able in action server as design description.
i do agree with @jacobperron design and requirement, but i am not so sure how to implement in details.
best,
@fujitatomoya
this is cached in rcl action layer, right. but there is no way to get it from rclcpp. and also this is configurable on rcl via result_timeout.
Yes.
Like @jacobperron said, this goal result should be able in action server as design description. i do agree with @jacobperron design and requirement, but i am not so sure how to implement in details.
Based on current rclcpp codes, I think multiple clients have to opportunity to get it isn't implemented. Current cached goal result in action server is only for debugging/introspection tools ?
@Barry-Xu-2018
multiple clients have to opportunity to get it isn't implemented.
No, me neither.
i believe this is clearly a bug, because it is against the design. there will be use case that client needs to be able to get the result anyway.
@fujitatomoya
i believe this is clearly a bug, because it is against the design. there will be use case that client needs to be able to get the result anyway.
Yes. Current action design doesn't clearly describe how multiple client get cached goal result. Author of design need to add more description (such as get cached goal result from action client side) on getting cached goal result.
@Barry-Xu-2018
Current action design doesn't clearly describe how multiple client get cached goal result.
i guess that all of the clients should be able to get the result at least once.
@Barry-Xu-2018 Thanks for the investigation! Yes, it looks like the result caching mechanism isn't implemented on the client side. The action server should be properly caching the goals, and using the rcl_action helper to expire them. This means I would expect the status message to contain terminated goals until they expire and result messages to be available as well. I'll look into fixing the client library so that it conforms the the design doc, which should hopefully resolve this issue.
@jacobperron
Thanks for your comments.
I'll look into fixing the client library so that it conforms the the design doc, which should hopefully resolve this issue.
Do you want to fix this yourself ?
I am also interesting to implement it. But current design doesn't clearly describe the behavior of getting cache goal result for terminated goal. Such as how multiple clients get cached goal result, only get cache result once or no limitation.
From my investigation, it looks like the action server is caching the results as expected. For example,
Start an action server
ros2 run action_tutorials_cpp fibonacci_action_server
Echo the status topic
ros2 topic --include-hidden-topics echo /fibonacci/_action/status
Send a couple goals and let them complete
ros2 action send_goal -f /fibonacci action_tutorials_interfaces/action/Fibonacci "order: 3"
ros2 action send_goal -f /fibonacci action_tutorials_interfaces/action/Fibonacci "order: 3"
You'll notice that multiple goal IDs exist in the status list, implying that the action server is caching them.
Directly related to this ticket, I was unable to reproduce the failure with a unit test (Linux or Windows Debug):
I traced the execution and noticed that the status message for the new goal is arriving before the goal response is received. In other words, the following line
is executed after this line
and so it evaluates to false since there is no goal handle to remove.
I don't think we should make any assumptions about the arrival time of the goal status messages. Additionally, I don't think it makes sense to stop tracking goal handles unless they have expired. IMO we should remove this logic
And replace it with code that removes any goal handles that are not in the status list (expired). This assumes that the goal status list published by the action server is updated whenever goals expire, which I haven't confirmed.
@Barry-Xu-2018 If you'd like to update the handle_status_message logic, I'm happy to review a PR, thanks!
If we want multiple clients to query the same result, I believe it should be possible in theory, but we don't have any nice tools or APIs to do so at the moment. I think a separate ticket should be opened if this is needed.
@jacobperron
If you'd like to update the handle_status_message logic, I'm happy to review a PR, thanks!
Thanks.
About solution, you suggest removing below codes.
This leads to keep all goal result. If there are many actions to done, the memory will be consumed continuously. If the execution time is long enough, consumed memory should be a problem. Of cause, this is an extreme case.
Besides, the description of related API should be updated.
e.g.
https://github.com/ros2/rclcpp/blob/fd3655c26cc363c128402eaab7f563bdd17a3b7c/rclcpp_action/include/rclcpp_action/client.hpp#L392-L404
an active goal
=> an goal
I will consider whether there are other problems.
Do you think above fixing is workaround ? Based on my thought, action server provide goal cache mechanism can resolve this problem. Of cause, there is no implementation which action can get goal result from action server after goal status is successful. This implementation maybe modify many codes.
If we want multiple clients to query the same result, I believe it should be possible in theory, but we don't have any nice tools or APIs to do so at the moment. I think a separate ticket should be opened if this is needed.
Totally agree.
This leads to keep all goal result. If there are many actions to done, the memory will be consumed continuously. If the execution time is long enough, consumed memory should be a problem.
Correct, this is why I suggest replacing the code with new logic that removes goal handles when they expire, instead of terminate. I believe the action server already implements the goal cache mechanism correctly (I may be wrong). So, we just have to handle expired goals on the client side.
@Barry-Xu-2018 @jacobperron
I don't think we should make any assumptions about the arrival time of the goal status messages.
me neither, that something we should avoid.
Additionally, I don't think it makes sense to stop tracking goal handles unless they have expired.
agree.
we just have to handle expired goals on the client side.
agree. also 15 mins should be enough?
for multiple clients, client library just keeps results until they expire?
@jacobperron
Correct, this is why I suggest replacing the code with new logic that removes goal handles when they expire, instead of terminate. I believe the action server already implements the goal cache mechanism correctly (I may be wrong). So, we just have to handle expired goals on the client side.
Yes. Action server already implemented goal cache mechanism. Do you want to notify client removing goal result while goal expire in action server? Based on this way, I think it cannot make sure action client must receive notification.
So I think it is better that action client decides what time goal result can be removed. One solution is to remove goal result after user get goal result. If user not get goal result, goal result is always kept. (I need to investigate feasibility based on current codes). Other is to implement similar goal cache mechanism like action server. User can configure expire time.
Above is my current thought. Maybe there is a better solution.
@fujitatomoya
agree. also 15 mins should be enough?
Like action server, it is better to be configurable by user.
for multiple clients, client library just keeps results until they expire?
Do you mean multiple clients query the same result ?
Do you want to notify client removing goal result while goal expire in action server? Based on this way, I think it cannot make sure action client must receive notification.
I suggest we use the existing status message to notify the client that a goal has expired: the goal ID will not be in the status list if it has expired. It's true that if there is poor communication that the client may miss some of the status messages, but I think this is fine. In the extreme case where the client can't contact the server, goals won't be removed from the client, but new goals won't be added either. Once the client and server can talk again, it only takes one new status message to invalidate any expired goals on the client side. I'd rather not increase complexity by adding a separate expiration timer to the action client.
Removing the goal handle after we pass the result to the user is still fine. I believe this is the current behavior and doesn't need to change.
I suggest we use the existing status message to notify the client that a goal has expired: the goal ID will not be in the status list if it has expired.
About existing status, you means 'ABORTED' status (this is an example, we reuse this status) is sent from action service while goal result expires. Only if action client receive this event, it remove goal result.
Once the client and server can talk again, it only takes one new status message to invalidate any expired goals on the client side.
Sorry. I don't understand this senior. 'it' means action client ? What is 'one new status message' ? Could you give an example ?
BTW, I am going to take Chinese New Year holiday. During holiday, maybe I cannot response your comments. I will be back on Feb 1st.
Here's an example of what I'm thinking.
42
. There will be a status message with information:
If there is no communication between the action client and action server after step (3), then the action client will keep the goal handle "forever". But, if communication is restored, then the next time the action client sends a goal it will get another status message, for example:
42
. There will be a status message with information:
100
. There will be a status message with information:
42
is not in the list, so it must be expired. Action client removes the goal handle for 42
and adds a new goal handle for goal 100
.@jacobperron
quick question,
- Action client sends a second goal request.
what if client does not send a second goal?
and expiration time window could be dependent on use cases, do you think that it should be configurable?
Best,
what if client does not send a second goal?
My first thought was that the client will not receive a new status message and the client will not realize that the first goal has expired. I.e. the client will hold on to the first goal handle forever (or until the action server state updates again prompting a new status message).
But on second thought, since the status topic (by default) has QoS settings reliable and transient local, I guess that the client would actually receive the most recent status message once communication is restored. So I think my example could be simplified. There is no need for a second goal to be sent.
and expiration time window could be dependent on use cases, do you think that it should be configurable?
It is configurable. The expiration time can be set using the action server options in rcl
. These options are passed here when creating an action server.
I was unable to reproduce the failure with a unit test
@jacobperron if you want a unit test that this triggers on, I got you :smile_cat: https://github.com/ros-planning/navigation2/blob/master/nav2_util/test/test_actions.cpp#L408
This test will fail, lets say 20-30% of the time, from this issue at the second instance of:
future_result = node_->action_client_->async_get_result(goal_handle);
in that test. Always at the same spot and after reading this discussion, I believe its root cause.
@Barry-Xu-2018 Friendly ping. Let me know if you still plan to work on this.
@jacobperron
Let me know if you still plan to work on this.
Sorry for long break since coronavirus affects work and life.
I continue to do this.
https://github.com/ros2/rclcpp/issues/861#issuecomment-575760271
Thanks for your detailed explanation.
- Action client see that goal_id 42 is not in the list, so it must be expired. Action client removes the goal handle for 42 and adds a new goal handle for goal 100.
I have a question on this. There is a case which action client send goal more than once.
For example, there are 2 threads and one action client. Each thread call async_send_goal() of action client. If communication problem occurs and affect one of goal, we don't know which goal_id should be removed while send a new goal.
Of cause this is special abnormal case.
So I can implement below action
- Action server expires the goal and sends another status message. Now the goal has been removed:
- Action client sees that the goal is no longer in the status message, so we remove the goal handle.
For example, there are 2 threads and one action client. Each thread call async_send_goal() of action client. If communication problem occurs and affect one of goal, we don't know which goal_id should be removed while send a new goal.
I don't see an issue in this case. Goal IDs are unique. If the action client sends two goals, each goal should have a unique ID. Therefore, it should not be a problem to know which goal has expired. Put simply, if a goal has completed (a result message was received) and the client does not see the goal ID in the status message, then we can assume the goal has expired.
@jacobperron
For example, there are 2 threads and one action client. Each thread call async_send_goal() of action client. If communication problem occurs and affect one of goal, we don't know which goal_id should be removed while send a new goal.
I don't see an issue in this case. Goal IDs are unique. If the action client sends two goals, each goal should have a unique ID. Therefore, it should not be a problem to know which goal has expired.
Yes. Goal ID is unique.
My question is how to choose Goal which is affected by communication problem and remove it.
First, assume we have implemented "Action server expires the goal and sends another status message. Now the goal has been removed".
If these 2 goals have got result and results aren't removed in client side, communication problem affect one of goal receiving status message.
Under this condition, how do we find affected Goal ?
Put simply, if a goal has completed (a result message was received) and the client does not see the goal ID in the status message, then we can assume the goal has expired.
The status message isn't sent a fixed frequency. Client doesn't know setting of expiry time in server side. How long client doesn't receive status message of Goal ID, we can think this goal has expired ?
- Action server expires the goal and sends another status message. Now the goal has been removed:
- Action client sees that the goal is no longer in the status message, so we remove the goal handle.
I have implemented above actions.
I use GOAL_EVENT_ABORT as expiry notification. (You suggest to use existing event. https://github.com/ros2/rclcpp/issues/861#issuecomment-575288825).
Modifications include
Please review them. Current codes doesn't deal with communication problem. That is, if communication problem occurs (which lead to not receive expiry notification, Goal ID will not be removed in client side.
We can continue to discuss based on codes.
I suggest we use the existing status message to notify the client that a goal has expired: the goal ID will not be in the status list if it has expired.
@jacobperron I'm not sure this is the right direction. It goes against the design doc which says "The purpose of the topic is for introspection; it is not used by the action client".
I just encountered this bug while writing a test, but there's a lot to read up on. How can the goal be gone if there's still a client goal handle?
Based on current rclcpp codes, I think multiple clients have to opportunity to get it isn't implemented.
i guess that all of the clients should be able to get the result at least once.
If we want multiple clients to query the same result, I believe it should be possible in theory, but we don't have any nice tools or APIs to do so at the moment. I think a separate ticket should be opened if this is needed.
@Barry-Xu-2018 @fujitatomoya I don't think multiple clients getting the result for the same goal was considered, which is why the design doc is silent on it. I think the assumption is only the client who sent the goal cares about that goal's result. If multiple clients getting the result for the same goal is important to your application, I recommend opening an issue at https://github.com/ros2/design
I'm not sure this is the right direction. It goes against the design doc which says
Maybe we should make an amendment to the design doc? The current implementation makes use of the status topic already for updating the state of client goal handles.
Maybe we should make an amendment to the design doc? The current implementation makes use of the status topic already for updating the state of client goal handles.
I would try to fix the implementation first, unless there's a problem with the design. I don't think the status messages should be relied on since nothing guarantees that they'll be sent or when they'll be received by the client.
I think this can be fixed by the ClientGoalHandle
destructor signaling to the Client
when it's destructed. That gives a callback for the Client
to do any cleanup like removing it from goal_handles_
. That solves this bug since if someone cares about the result, they're holding onto a ClientGoalHandle
instance, so the goal handle must be in goal_handles_
. However, it is a little tricky since Client
#include
s ClientGoalHandle
and since nothing stops ClientGoalHandle
from being destructed after Client
.
One option is for Client
to have a member a std::shared_ptr<std::function<void(const ClientGoalHandle *)>> on_goal_handle_destruct_
which is set to a callback that deletes the goal handle, and give the ClientGoalHandle
constructor a std::weak_ptr<std::function<void(const ClientGoalHandle *)>>
as a callback to call when the goal handle is destructed. If the Client
exists when the ClientGoalHandle
is destructed, the weak pointer can be promoted to a shared pointer and the callback can be called.
It also looks like the client will need to keep a subscription to the status topic to support ClientGoalHandle::get_status()
, but that fits within the intent of the topic being for introspection.
@sloretz
Based on current rclcpp codes, I think multiple clients have to opportunity to get it isn't implemented.
i guess that all of the clients should be able to get the result at least once.
If we want multiple clients to query the same result, I believe it should be possible in theory, but we don't have any nice tools or APIs to do so at the moment. I think a separate ticket should be opened if this is needed.
@Barry-Xu-2018 @fujitatomoya I don't think multiple clients getting the result for the same goal was considered, which is why the design doc is silent on it. I think the assumption is only the client who sent the goal cares about that goal's result. If multiple clients getting the result for the same goal is important to your application, I recommend opening an issue at https://github.com/ros2/design
Yes. Current design doesn't consider this case. If necessary, I create new issue to discuss it.
@Barry-Xu-2018 Thank you for the PR (it looks good to me), but I agree with @sloretz's assessment: it would be best if we can solve this problem without relying on the status topic.
@sloretz What if we instead maintain a list of weak pointers to goal handles in Client
? Then anytime we need to operate on a goal handle we first check to see if it is still valid (i.e. a user still holds a reference to it) and if not, then we remove it from the Client
list. I think this is more straight forward than the callback mechanism being proposed.
@jacobperron
Thank you for the PR (it looks good to me), but I agree with @sloretz's assessment: it would be best if we can solve this problem without relying on the status topic.
Thanks for checking. I agree with you. The idea from @sloretz is a better way.
@jacobperron @sloretz
I got your idea. But I find there is a limitation for user.
User must hold ClientGoalHandle.
In example code for ROS2
It doesn't hold ClientGoalHandle. It only does something in callback function. If modify based on your way, callback function never be called.
Of course, real codes doesn't like this.
@sloretz
Friendly ping. How about your fixing ?
Besides, jacobperron have some comments. https://github.com/ros2/rclcpp/issues/861#issuecomment-595973399
I've started trying a solution involving storing weak pointers to goal handles in rclcpp_action::Client
.
Feedback appreciated on https://github.com/ros2/rclcpp/pull/1122
@sloretz If you have a moment.
@Barry-Xu-2018 @SteveMacenski I'd like to have a unit test that can reliable reproduce the issue, but struggling to make one. Do you have more complex scenarios where you see the bug and could perhaps check if the referenced PR helps at all?
@jacobperron the link I showed above (https://github.com/ros-planning/navigation2/blob/master/nav2_util/test/test_actions.cpp) triggers it ~30% of the time I'd say, if you ran those action utils tests 20 times in a loop and it didn't fail in any of them, you've definitely fixed it. Those tests are also pretty portable and you're welcome to steal them and move them here. We use a simple action server for the test fixture but that can be easily replaced with your own server to call.
I'll take a look at the PR.
What you could do to run on our end in the laziest way possible (which I encourage) is add your branch of rclcpp to our repos
file (https://github.com/ros-planning/navigation2/blob/master/tools/ros2_dependencies.repos) and run locally or submit a draft PR. I mostly suggest the draft PR because we have a ton of caching in CI so its faster than building the docker image from scratch. Run it a dozen times and it should be clear if that actions test is failing.
With a Debug build on Linux, I was able to reproduce according to this ticket's description: uncommenting this line in test_buffer_client
. Without a fix, it appears to fail about 15 out of 100 times. With the fix proposed in #1122, I haven't seen a test failure.
Here is the incantation I used for testing:
failures=0; for i in {0..99} ; do ; if ! ./build/tf2_ros/test_buffer_client; then failures=$(($failures+1)); fi; done; echo "Number of failures: $failures"
If nothing else, you definitely fixed something then :1st_place_medal: (but yes, on review, that looks correct to me)
@jacobperron
With a Debug build on Linux, I was able to reproduce according to this ticket's description: uncommenting this line in test_buffer_client.
I use the same way to reproduce this problem.
Thanks everyone for the discussion and reviews! It looks like #1122 resolves this bug.
Bug report
While investigating a failure as part of https://github.com/ros2/geometry2/pull/172, I found out that action servers and clients in the same process may run into a race. If the server immediately sets a goal to "success" or "failed" during the "handle_accept" callback (like here), the entire action may be gone by the time the client goes to get the result (via async_get_result). I was only able to reproduce this particular behavior while using Windows Debug. It's not entirely clear to me if this is a bug, since I'm not sure if it is valid for a server to set the goal to "success" or "fail" during the "handle_accept" callback. Nonetheless, @jacobperron thought it would be a good idea to open a bug to track this.
Required Info:
colcon build --merge-install --cmake-args -DCMAKE_BUILD_TYPE=Debug
)Steps to reproduce issue
Comment out the workaround line here.
Build on Windows in Debug mode:
Run on Windows in Debug mode (you may have to do this several times to make the issue happen):
Expected behavior
Test passes every time.
Actual behavior
Test fails sometimes with "Goal handle is not known to this client"