ros-controls / ros2_control

Generic and simple controls framework for ROS 2
https://control.ros.org
Apache License 2.0
479 stars 293 forks source link

Controller restart by switch_controller #1568

Closed TakashiSato closed 3 months ago

TakashiSato commented 3 months ago

This PR proposes the controller restart function as discussed in issue #674.

In ROS1, controller restart was implemented by passing true to the reset_controller argument of the ControllerManager::update function, as demonstrated here. This approach allowed the controller to be stopped, started, and updated consecutively within a single execution of the ControllerManager::update function, enabling continuous robot operation without interrupting control calculations. However, applying this method to the update function in ROS2 would cause an API break, significantly impacting users.

Alternatively, PR #677 describes the "simplest way" to perform a restart by specifying the same controller name for both start and stop in the switch_controller function. However, the current implementation of ControllerManager cannot handle this input correctly, making it unfeasible. Consequently, the current method involves executing stop first, followed by start, leading to a temporary stop period and causing discontinuity in the controller's operation.

To address this, the implementation has been modified to allow the switch_controller function to perform a restart even when the same controller is specified for both start and stop. Additionally, during the implementation, it was discovered that the current logic of switch_controller is complex and contains potential bugs, necessitating a refactor of its logic.

As a result, this PR includes a significant number of changes. However, I have confirmed that all existing tests, as well as additional tests for previously identified potential bugs, have passed without issues. Furthermore, new tests for the restart functionality have been created and passed successfully.

If you have any comments or feedback, please feel free to share your thoughts.

codecov[bot] commented 3 months ago

Codecov Report

Attention: Patch coverage is 95.42744% with 23 lines in your changes missing coverage. Please review.

Project coverage is 88.49%. Comparing base (1ed61ef) to head (3721ad6). Report is 6 commits behind head on master.

:exclamation: Current head 3721ad6 differs from pull request most recent head 1865ab7

Please upload reports for the commit 1865ab7 to get more accurate results.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #1568 +/- ## ========================================== + Coverage 87.70% 88.49% +0.79% ========================================== Files 102 105 +3 Lines 8704 9015 +311 Branches 780 790 +10 ========================================== + Hits 7634 7978 +344 + Misses 790 758 -32 + Partials 280 279 -1 ``` | [Flag](https://app.codecov.io/gh/ros-controls/ros2_control/pull/1568/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ros-controls) | Coverage Δ | | |---|---|---| | [unittests](https://app.codecov.io/gh/ros-controls/ros2_control/pull/1568/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ros-controls) | `88.49% <95.42%> (+0.79%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ros-controls#carryforward-flags-in-the-pull-request-comment) to find out more. | [Files](https://app.codecov.io/gh/ros-controls/ros2_control/pull/1568?dropdown=coverage&src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ros-controls) | Coverage Δ | | |---|---|---| | [.../include/controller\_manager/controller\_manager.hpp](https://app.codecov.io/gh/ros-controls/ros2_control/pull/1568?src=pr&el=tree&filepath=controller_manager%2Finclude%2Fcontroller_manager%2Fcontroller_manager.hpp&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ros-controls#diff-Y29udHJvbGxlcl9tYW5hZ2VyL2luY2x1ZGUvY29udHJvbGxlcl9tYW5hZ2VyL2NvbnRyb2xsZXJfbWFuYWdlci5ocHA=) | `100.00% <ø> (ø)` | | | [...chainable\_controller/test\_chainable\_controller.cpp](https://app.codecov.io/gh/ros-controls/ros2_control/pull/1568?src=pr&el=tree&filepath=controller_manager%2Ftest%2Ftest_chainable_controller%2Ftest_chainable_controller.cpp&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ros-controls#diff-Y29udHJvbGxlcl9tYW5hZ2VyL3Rlc3QvdGVzdF9jaGFpbmFibGVfY29udHJvbGxlci90ZXN0X2NoYWluYWJsZV9jb250cm9sbGVyLmNwcA==) | `85.24% <100.00%> (+1.03%)` | :arrow_up: | | [...chainable\_controller/test\_chainable\_controller.hpp](https://app.codecov.io/gh/ros-controls/ros2_control/pull/1568?src=pr&el=tree&filepath=controller_manager%2Ftest%2Ftest_chainable_controller%2Ftest_chainable_controller.hpp&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ros-controls#diff-Y29udHJvbGxlcl9tYW5hZ2VyL3Rlc3QvdGVzdF9jaGFpbmFibGVfY29udHJvbGxlci90ZXN0X2NoYWluYWJsZV9jb250cm9sbGVyLmhwcA==) | `100.00% <ø> (ø)` | | | [...r\_manager/test/test\_controller/test\_controller.cpp](https://app.codecov.io/gh/ros-controls/ros2_control/pull/1568?src=pr&el=tree&filepath=controller_manager%2Ftest%2Ftest_controller%2Ftest_controller.cpp&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ros-controls#diff-Y29udHJvbGxlcl9tYW5hZ2VyL3Rlc3QvdGVzdF9jb250cm9sbGVyL3Rlc3RfY29udHJvbGxlci5jcHA=) | `95.45% <100.00%> (+0.71%)` | :arrow_up: | | [...r\_manager/test/test\_controller/test\_controller.hpp](https://app.codecov.io/gh/ros-controls/ros2_control/pull/1568?src=pr&el=tree&filepath=controller_manager%2Ftest%2Ftest_controller%2Ftest_controller.hpp&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ros-controls#diff-Y29udHJvbGxlcl9tYW5hZ2VyL3Rlc3QvdGVzdF9jb250cm9sbGVyL3Rlc3RfY29udHJvbGxlci5ocHA=) | `100.00% <ø> (ø)` | | | [...ller\_with\_command/test\_controller\_with\_command.hpp](https://app.codecov.io/gh/ros-controls/ros2_control/pull/1568?src=pr&el=tree&filepath=controller_manager%2Ftest%2Ftest_controller_with_command%2Ftest_controller_with_command.hpp&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ros-controls#diff-Y29udHJvbGxlcl9tYW5hZ2VyL3Rlc3QvdGVzdF9jb250cm9sbGVyX3dpdGhfY29tbWFuZC90ZXN0X2NvbnRyb2xsZXJfd2l0aF9jb21tYW5kLmhwcA==) | `100.00% <100.00%> (ø)` | | | [...t\_controllers\_chaining\_with\_controller\_manager.cpp](https://app.codecov.io/gh/ros-controls/ros2_control/pull/1568?src=pr&el=tree&filepath=controller_manager%2Ftest%2Ftest_controllers_chaining_with_controller_manager.cpp&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ros-controls#diff-Y29udHJvbGxlcl9tYW5hZ2VyL3Rlc3QvdGVzdF9jb250cm9sbGVyc19jaGFpbmluZ193aXRoX2NvbnRyb2xsZXJfbWFuYWdlci5jcHA=) | `99.24% <100.00%> (+0.39%)` | :arrow_up: | | [...ontroller\_manager/test/test\_restart\_controller.cpp](https://app.codecov.io/gh/ros-controls/ros2_control/pull/1568?src=pr&el=tree&filepath=controller_manager%2Ftest%2Ftest_restart_controller.cpp&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ros-controls#diff-Y29udHJvbGxlcl9tYW5hZ2VyL3Rlc3QvdGVzdF9yZXN0YXJ0X2NvbnRyb2xsZXIuY3Bw) | `100.00% <100.00%> (ø)` | | | [...ller\_with\_command/test\_controller\_with\_command.cpp](https://app.codecov.io/gh/ros-controls/ros2_control/pull/1568?src=pr&el=tree&filepath=controller_manager%2Ftest%2Ftest_controller_with_command%2Ftest_controller_with_command.cpp&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ros-controls#diff-Y29udHJvbGxlcl9tYW5hZ2VyL3Rlc3QvdGVzdF9jb250cm9sbGVyX3dpdGhfY29tbWFuZC90ZXN0X2NvbnRyb2xsZXJfd2l0aF9jb21tYW5kLmNwcA==) | `88.23% <88.23%> (ø)` | | | [controller\_manager/src/controller\_manager.cpp](https://app.codecov.io/gh/ros-controls/ros2_control/pull/1568?src=pr&el=tree&filepath=controller_manager%2Fsrc%2Fcontroller_manager.cpp&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ros-controls#diff-Y29udHJvbGxlcl9tYW5hZ2VyL3NyYy9jb250cm9sbGVyX21hbmFnZXIuY3Bw) | `78.82% <90.78%> (+4.26%)` | :arrow_up: |
christophfroehlich commented 3 months ago

With "restart" you mean deactivate and activate transition? Could you please comment on a usecase, where this is necessary? Isn't there something missing in the controller's interfaces if something like this is necessary?

TakashiSato commented 3 months ago

With "restart" you mean deactivate and activate transition?

Yes. I referred to the process of deactivating a controller from an active state and then reactivating it within the same update loop as a "restart."

Here are some use cases where this might be useful:

In common real robots like robotic arms, there are often functions to stop the robot when it receives an external signal input to the controller or when a self-diagnosis function detects a dangerous condition. These stopping methods include an "emergency stop," which cuts off power supply to the joint motors and reduces driving torque to zero (typically, electromagnetic brakes are also applied to hold the joints to prevent them from falling due to gravity), and a "protective stop," which continues to generate torque to hold the position after decelerating through joint control.

When these stopping functions are triggered in a robot using ros2_control, the real robot controller will override the commands from ros2_controllers for safety. Currently, ros2_control does not have a general method to detect such stop states or command overrides by the real-world robot controller, which can lead to a mismatch between the actual state of the robot and the commands from ros2_controllers.

While the JointTrajectoryController (JTC) has a function to hold position when the deviation between the robot's current state (e.g., position) and the command value exceeds a certain threshold (here), not all ros2_controllers have such closed-loop monitoring functions (e.g., ForwardCommandController). Even in JTC, deviations within the threshold may not be considered abnormal, allowing discrepancies between command and actual values. If the stop state is released under these conditions, there is a risk of sudden joint movement being triggered.

Therefore, in many cases, to re-enable joint control by ros2_control after an emergency stop or protective stop (particularly important), it is necessary to reset the command state of ros2_controllers (e.g., matching the command position with the current position, setting the command speed to zero, erase buffered value, etc.). As mentioned in the first comment of this PR, since there is no general method to do this in ros2_control, I proposed a workaround to "restart" the same controller by specifying it in switch_controller.

However, after reconsidering your feedback, it may not be necessary to perform the deactivate (stop) and activate (start) within the same update loop as I originally proposed (i.e., they could be executed by separate switch_controller requests). Here is how this relates to our robot use case:

In our robot, we have implemented processing in a custom HardwareComponent (SystemInterface) to switch the commands sent to the real robot controller based on the situation where the command_interface (Position/Velocity/Effort, etc.) is claimed. Related to this processing, we have implemented a decision process to disable the servo state and apply the electromagnetic brakes for safety when there are joints not claimed by any ros2_controllers, resulting in an undefined command_interface.

Therefore, if we handle deactivate and activate with separate switch_controller requests, the servo state would be disabled during the period when no ros2_controllers are assigned, triggering the decision process mentioned above. To avoid this (and reset the controller state while keeping the servo state enabled), I proposed executing deactivate and activate within the same update loop.

If "restart" is not considered useful and only adds complexity to the logic, I am open to revising the PR to focus on refactoring and bug fixes instead of the "restart" feature.

christophfroehlich commented 3 months ago

Wouldn't it be useful if your hardware component reports the emergency stop to the controller_manager, e.g. by returning an error in the read/write methods, and the controllers claiming the interfaces are stopped automatically? Then everything could be started again later on. but i'm not sure how far this is implemented already tbh.

saikishor commented 3 months ago

Hello!

I suggest doing as @christophfroehlich suggested. If there is this kind of error or the emergency stop, you can return return_type::ERROR either in the read cycle or in the write cycle or both. This will make the controller manager automatically deactivate all the active controllers that use the interfaces of the failing hardware. Once you do the hardware lifecycle, you can restart the controllers automatically.

However, the servo_state you were mentioning earlier would be affected. Would the earlier explained process work for you? Or not?

Could you also please share with us the list of bugs you have addressed here?.

Thank you

christophfroehlich commented 3 months ago

Despite the discussion of your use case, we agreed that restarting should be possible. @TakashiSato This PR has 2300 line changes, which makes it very hard to review. Could you maybe split this into several ones, and describe which bugs/changes you address with every PR?

TakashiSato commented 3 months ago

@christophfroehlich @saikishor Thank you for your comments!

After reviewing the implementation and behavior of the method you suggested, where the hardware component's read/write returns return_type::ERROR, we believe this approach is feasible. However, since we had previously implemented the function to handle e_stop or p_stop requests and their release as a ros_controller (named safety_controller) based on our ROS1 implementation, we found it challenging to directly apply the aforementioned method, which deactivates all controllers.

Nevertheless, we believe this issue can be resolved by implementing functionality similar to safety_controller on the hardware_component side.

Additionally, we discovered that by returning return_type::DEACTIVATE in the hardware component's read/write, we can transition the hardware component to an inactive state without affecting the controllers. (For reference, links to the related implementation are provided below:)

Using return_type::DEACTIVATE may allow us to retain functionality similar to safety_controller on the ros2_controller side. This approach seems particularly suitable for purposes such as protective stop with deceleration control or maintaining the servo_state.

The choice of approach will depend on our requirements and implementations, so we will consider methods that induce state transitions in the hardware component moving forward.

On the other hand, as @christophfroehlich mentioned, the controller restart function itself seems useful regardless of the above discussion.

Regarding the list of bugs I have addressed and the division of PRs, it may take some time to organize the information and code, but I plan to address this in the near future.

saikishor commented 3 months ago

@TakashiSato may be it's better to open separate PRs for separate bugs, so it's faster and easier to get them merged. What do you think?

TakashiSato commented 3 months ago

@saikishor Thank you for your advice! I'm sorry for the confusion caused by some of my commits in an effort to organize information. I am planning to submit bug fixes and the implementation of restart (a version with fewer implementation changes) in separate PRs shortly.

TakashiSato commented 3 months ago

@christophfroehlich @saikishor I have separated the changes in this PR into the following two PRs:

As included in this PR, I am also considering proposing further refactoring related to the switch_controller logic. However, since the above two PRs address all functional issues, I may not proceed with the refactoring in the immediate future.

Should I close this PR?

christophfroehlich commented 3 months ago

Thanks, yes let's close this PR.