robosoft-ai / SMACC2

An Event-Driven, Asynchronous, Behavioral State Machine Library for ROS2 (Robotic Operating System) applications written in C++
https://smacc.dev
Apache License 2.0
223 stars 36 forks source link

SmaccAsyncClientBehavior onExit() freezes when cancelGoal() is called on the attached action client #540

Closed sukhrajklair closed 2 months ago

sukhrajklair commented 3 months ago

Describe the bug I created a custom asynchronous client behavior by inheriting from SmaccAsyncClientBehavior which sends a goal to an action server (using the attached smacc action client) and cancel the goal inside onExit(). The goal gets sent successfully and EvCbFinished is transmitted. But when onExit() for the behavior is called due to a transition to another state, the cb's onExit() never finishes. The action server does receive a request to cancel all goals and actually cancels the goals. I have other orthogonals in this state which include subscriber clients and even their callbacks don't get called.

Expected behavior The onExit() of CB should finish and the state should transition

Environment (please complete the following information):

ROS DETAILS:

REPO DETAILS:

BUILD DETAILS

Additional context I took a look at the implementation of the smacc_action_client_base and believe that this line is an issue. This line pauses all of the callback execution of the node and the node never receives the cancellation response from the action server consequently the future never returns. When I replaced it with rclcpp::spin_until_future_complete(getNode()->get_node_base_interface(), fut);, onExit() of the CB finishes and the state machine transitions to the next state.

┆Issue is synchronized with this Jira Task by Unito

pabloinigoblasco commented 3 months ago

Hello @sukhrajklair, thank you for sharing your findings.

I appreciate your efforts in addressing this issue. It's crucial that we explore this further as it may reveal potential improvements. Nevertheless, I'd like to carefully consider your hypothesis and proposed solution.

You mentioned that a specific line halts all callback executions within the node, which consequently prevents the node from receiving the cancellation response from the action server, leaving the future unresolved. However, I'm inclined to disagree that all callbacks are halted. There's another thread, the "signal detector thread," that operates independently from the onExit function of the AsynchronousClientBehavior. This thread should handle the cancel request/response and unlock the future.

Regarding your solution—replacing the problematic line with rclcpp::spin_until_future_complete(getNode()->get_node_base_interface(), fut)—it's intriguing that this change allowed the onExit of the CB to complete and facilitated the state transition. However, I believe this approach may introduce conflicts with the "signal detector" thread's main ROS loop. Therefore, I'm hesitant to endorse this solution without further examination.

To proceed effectively, I suggest creating a basic test case to replicate this behavior. This would allow us to collaboratively assess whether a fix or modification is necessary.

sukhrajklair commented 3 months ago

Hello @pabloinigoblasco

I'm still new to the SMACC2 library, so plesae take everything I say with a grain of salt. I've created an example state machine and clients which reproduce the issue I faced. Please check it out here image

ClModeSelect+CbModeSelect subscribes to a topic called "/mode_command" and generates an event based on the value received. ClFibonacci+CbFibonacci creates an action client and sends goal to the Fibonacci_action_server provided in action_tutorials_cpp pkg

Please follow these steps to reproduce the issue:

  1. build the simple_action_client_example pkg that I added to the smacc2_sm_reference_library directory in my above linked forked repo
  2. run the state machine: ros2 run simple_action_client_example simple_action_client_example_node
  3. run the Fibonacci action server: ros2 run action_tutorials_cpp fibonacci_action_server
  4. publish msg on /mode_command topic: ros2 topic pub /mode_command example_interfaces/msg/Int32 "{data: 1}". This should cause the state machine to transition to StStart2. Upon entry to StStart2, CbFibonacci sends a goal to the fibonacci_action_server.
  5. publish another msg /mode_command topic: ros2 topic pub /mode_command example_interfaces/msg/Int32 "{data: 0}". This generates an event and should trigger a transition back to StStart1 which will consequently call onExit() of CbFibonacci. Inside onExit(), it calls cancelGoal() of ClFibonacci. The action server receives a request to cancel all goals and so it does. However, the onExit() never finishes. Any further msgs published on /mode_command topic do not generate any event in the state machine.

When I say "all callback executions within the node are halted", I basically extrapolated this from my observation in the step 5 of the above process.

When I replace fut.wait() with rclcpp::spin_until_future_complete(getNode()->get_node_base_interface(), fut) and repeats the above steps, the onExit() of CbFibonacci finishes and state machine transitions back to StState1.

pabloinigoblasco commented 3 months ago

Thank you for sharing this information, @sukhrajklair. I appreciate your help. I'm going to replicate your case and perform some debugging to better understand the issue. I'll follow up with you soon to provide further insights.

pabloinigoblasco commented 3 months ago

Hello @sukhrajklair we have been analyzing this. Indeed the bug is there, I can confirm that. However the proposed solution is incorrect because it blocks the signal detector thread. We are working in an alternative solution that I hope it will be around in the next few days.

Thanks.

pabloinigoblasco commented 2 months ago

This is the solution I propose, esentially removing the condition of synchronicity in a cancel request.

https://github.com/robosoft-ai/SMACC2/pull/542

There is some small discussion here why that future wait was created, though. Your demo still works with this solution.

Regards.

sukhrajklair commented 2 months ago

@pabloinigoblasco I can't think of any reason, at least for my application, why I would need the cancel request to be synchronous. So this solution works. I appreciate your effort on this.

brettpac commented 2 months ago

I just wanted to say thank you @sukhrajklair for writing a fantastic ticket, along with a code sample so that we could reproduce the issue.

Would you be willing to create a pull request with the state machine sm_simple_action_client in the reference library that we could add to the main branch?

sukhrajklair commented 2 months ago

@brettpac I appreciate the timely response. I've created a PR with my code.

brettpac commented 2 months ago

Thank you @sukhrajklair, we'll be merging it shortly.