ros-controls / ros2_control

Generic and simple controls framework for ROS 2
https://control.ros.org
Apache License 2.0
433 stars 272 forks source link

Bug fix for switch_controller when using controller chaining #1591

Open TakashiSato opened 1 week ago

TakashiSato commented 1 week ago

This PR is separated to address only the changes related to bug fix #1568 , fixing a bug that occurs under specific conditions when handling chainable controllers in the controller_manager's switch_controller.

Note: To focus on meaningful changes, it is recommended to compare the code using "hide whitespace" mode as follows. https://github.com/ros-controls/ros2_control/compare/master...TakashiSato:ros2_control:feature/fix_switch_controller_bug?diff=split&w=1

Explanation of the Bugs

Using the typical model for controller chaining, which is also used in this test, the following explanations introduce three situations where bugs occur.

Note that these bugs occur only when the strictness value is set to BEST_EFFORT. When the strictness value is set to STRICT, switch_controller fails, and these bugs do not arise. Additionally, the test cases to detect these bugs are implemented in this commit.

BUG1

In this situation, the start request for diff_drive should be rejected, and only the stop command for pid_right should be accepted. While this result is achieved, there is an issue where pid_left incorrectly enters in_chained mode.

The cause of this bug lies in this section. When checking the activate request for diff_drive, the first following controller, pid_left, is checked for any inconsistencies along with the request. Since no issues are detected, pid_left is added to the to_chained_mode_request. However, when the check is performed with the next controller, pid_right, inconsistencies are detected because pid_right is targeted for deactivation, leading to the rejection of the diff_drive activate request. Since the to_chained_mode_request is not re-evaluated in subsequent processes, this request remains and results in the actual transition to chained mode during manage_switch.

BUG2

This situation is similar to BUG1, where the start request for diff_drive is rejected, and only pid_left is activated. However, pid_left still incorrectly enters in_chained mode. The cause of this bug is almost the same as BUG1. During the activate check for diff_drive, since the following controller pid_left is also a target for activation, pid_left is added to the to_chained_mode_request. Then, during the subsequent check with pid_right, since pid_right is not a target for activation, the activate request for diff_drive is rejected, leaving the to_chained_mode_request for pid_left.

Incidentally, in this situation, the checks are performed in the order of pid_left followed by pid_right. Therefore, if pid_right and pid_left in the switch requests for BUG1 and BUG2 are swapped, the checks will function correctly, and these issues will not occur.

BUG3

In this situation, since it is impossible to deactivate only diff_drive, the request should be rejected and no changes should occur. However, the following switch requests are internally generated, and since these switches cannot be handled properly, a deadlock occurs.

The sequence in which such switch requests are generated is as follows:

  1. Since diff_drive is a deactivate target, in propagate_deactivation_of_chained_mode, its following controllers, pid_left and pid_right, are added to the from_chained_mode_request.
  2. The activate request is empty, so check_following_controllers_for_activate does nothing. (Note: The process to erase from from_chained_mode_request exists only within this function.)
  3. In check_preceding_controllers_for_deactivate, the preceding controller of diff_drive, position_tracking, is active and not present in the deactivate_request, resulting in the rejection of the diff_drive deactivate request.
  4. Since pid_left and pid_right are included in the from_chained_mode_request and meet the conditions, they are added to the (de)activate_request as restart targets due to this process.

Bug Fix Proposal

In this part of the check process, propagate_deactivation_of_chained_mode, check_following_controllers_for_activate, and check_preceding_controllers_for_deactivate perform judgments based on the content of the (de)activate_request. Therefore, if the content of the (de)activate_request changes, the process should be retried from the beginning. However, during this process, it is also possible that the content of from/to_chained_mode_request has changed, so it is necessary to properly clear these as well.

This commit demonstrates the simplest change to fix the bug based on this idea.

However, since the above commit uses goto, which is generally not recommended, the final implementation in this PR has been rewritten using lambda functions and a while loop for the retry process (The corresponding commit).

codecov[bot] commented 1 week ago

Codecov Report

Attention: Patch coverage is 97.05882% with 2 lines in your changes missing coverage. Please review.

Project coverage is 87.88%. Comparing base (fbb893b) to head (005a79f). Report is 3 commits behind head on master.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #1591 +/- ## ========================================== + Coverage 87.70% 87.88% +0.17% ========================================== Files 102 102 Lines 8704 7950 -754 Branches 780 668 -112 ========================================== - Hits 7634 6987 -647 + Misses 790 721 -69 + Partials 280 242 -38 ``` | [Flag](https://app.codecov.io/gh/ros-controls/ros2_control/pull/1591/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ros-controls) | Coverage Δ | | |---|---|---| | [unittests](https://app.codecov.io/gh/ros-controls/ros2_control/pull/1591/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ros-controls) | `87.88% <97.05%> (+0.17%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ros-controls#carryforward-flags-in-the-pull-request-comment) to find out more. | [Files](https://app.codecov.io/gh/ros-controls/ros2_control/pull/1591?dropdown=coverage&src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ros-controls) | Coverage Δ | | |---|---|---| | [...t\_controllers\_chaining\_with\_controller\_manager.cpp](https://app.codecov.io/gh/ros-controls/ros2_control/pull/1591?src=pr&el=tree&filepath=controller_manager%2Ftest%2Ftest_controllers_chaining_with_controller_manager.cpp&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ros-controls#diff-Y29udHJvbGxlcl9tYW5hZ2VyL3Rlc3QvdGVzdF9jb250cm9sbGVyc19jaGFpbmluZ193aXRoX2NvbnRyb2xsZXJfbWFuYWdlci5jcHA=) | `98.95% <100.00%> (+0.10%)` | :arrow_up: | | [controller\_manager/src/controller\_manager.cpp](https://app.codecov.io/gh/ros-controls/ros2_control/pull/1591?src=pr&el=tree&filepath=controller_manager%2Fsrc%2Fcontroller_manager.cpp&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ros-controls#diff-Y29udHJvbGxlcl9tYW5hZ2VyL3NyYy9jb250cm9sbGVyX21hbmFnZXIuY3Bw) | `71.44% <94.11%> (-3.12%)` | :arrow_down: | ... and [7 files with indirect coverage changes](https://app.codecov.io/gh/ros-controls/ros2_control/pull/1591/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ros-controls)
mergify[bot] commented 2 days ago

This pull request is in conflict. Could you fix it @TakashiSato?

TakashiSato commented 2 days ago

I have fixed the conflict.

Due to the changes in the recently merged #1021, BUG3 no longer manifests. Regarding BUG1 and BUG2, issues still persist. For BUG3, although the issue no longer occurs, there are still abnormal requests in the from/to_chained_mode request up until just before the error check. Therefore, I believe the fix proposed in this PR remains desirable.