ros-controls / ros2_control_demos

This repository aims at providing examples to illustrate ros2_control and ros2_controllers
https://control.ros.org
Apache License 2.0
393 stars 185 forks source link

"on_deactivate()" seems not called when setting the controller to be "inactive" #239

Closed xibeisiber closed 1 year ago

xibeisiber commented 1 year ago

Hi, I tried diffbot.launch.py and rrbot_system_position_only.launch.py. The printing information in on_activate() is printed out in console, but when I "ros2 control set_controller_state forward_position_controller inactive" and "ros2 control set_controller_state diffbot_base_controller inactive", the printing information in "on_deactivate()" is not printed in console.

Another question is: will on_deactivate() be automatically called when I terminate the startup node (press ctrl+c indiffbot.launch.py)?

Thanks! Jia

christophfroehlich commented 1 year ago

Hi, I think this is not an issue/question of this demo-repository but of https://github.com/ros-controls/ros2_control directly.

My experience on the termination with ctrl+c is that it is not really reproducible what happens (independent of ros2_control) ;)

destogl commented 1 year ago

I tried diffbot.launch.py and rrbot_system_position_only.launch.py. The printing information in on_activate() is printed out in console, but when I "ros2 control set_controller_state forward_position_controller inactive" and "ros2 control set_controller_state diffbot_base_controller inactive", the printing information in "on_deactivate()" is not printed in console.

Oh, this is very strange. this should definitely not happen. This part of ros2_control is well tested. I would not expect bugs there, nevertheless it is possible. I will test this when testing new restructured examples in #232 to be sure. Can you check the example one from this PR and report if you get the same behavior?

Another question is: will on_deactivate() be automatically called when I terminate the startup node (press ctrl+c indiffbot.launch.py)?

I cannot answer you in detail – I know that we had a discussion about this and about catching events. But for what I know, we are not doing a graceful “shutdown” of the framework. It is definitely something to think about, but until now, it wasn't really an issue in all the robots we have tested this.

This functionality has to be shaped well since it would include all standard and user-components to have pre-defined behavior.

xibeisiber commented 1 year ago

I tried diffbot.launch.py and rrbot_system_position_only.launch.py. The printing information in on_activate() is printed out in console, but when I "ros2 control set_controller_state forward_position_controller inactive" and "ros2 control set_controller_state diffbot_base_controller inactive", the printing information in "on_deactivate()" is not printed in console.

Oh, this is very strange. this should definitely not happen. This part of ros2_control is well tested. I would not expect bugs there, nevertheless it is possible. I will test this when testing new restructured examples in #232 to be sure. Can you check the example one from this PR and report if you get the same behavior?

Another question is: will on_deactivate() be automatically called when I terminate the startup node (press ctrl+c indiffbot.launch.py)?

I cannot answer you in detail – I know that we had a discussion about this and about catching events. But for what I know, we are not doing a graceful “shutdown” of the framework. It is definitely something to think about, but until now, it wasn't really an issue in all the robots we have tested this.

This functionality has to be shaped well since it would include all standard and user-components to have pre-defined behavior.

Yes. I can reproduce the behavior and find a new problem.. Environment: ros2 humble, ubuntu 22.04, x86_64. The ros2_control_demo is in commit 1007b1dfe44f4146e64b04666a1164adeaf8ffa1.

I comment the print information of read()&write() in diffbot_system.cpp. When launching diffbot.launch.py, the info in on_activate() is printed out.

When ros2 control set_controller_state diffbot_base_controller inactive, the info in on_deactivate() is not printed out. But this controller is successfully disactived by "ros2 control list_controllers".

When ros2 control set_controller_state diffbot_base_controller active again, the node dies. error message is:

[INFO] [spawner-4]: process has finished cleanly [pid 17612]
[ros2_control_node-1] Stack trace (most recent call last) in thread 17583:
[ros2_control_node-1] #6    Object "", at 0xffffffffffffffff, in 
[ros2_control_node-1] #5    Source "../sysdeps/unix/sysv/linux/x86_64/clone3.S", line 81, in __clone3 [0x7f62867269ff]
[ros2_control_node-1] #4    Source "./nptl/pthread_create.c", line 442, in start_thread [0x7f6286694b42]
[ros2_control_node-1] #3    Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30", at 0x7f6286adc2b2, in 
[ros2_control_node-1] #2    Object "/opt/ros/humble/lib/controller_manager/ros2_control_node", at 0x555b4ec873db, in 
[ros2_control_node-1] #1    Object "/opt/ros/humble/lib/libcontroller_manager.so", at 0x7f6286efe7a7, in controller_manager::ControllerManager::update(rclcpp::Time const&, rclcpp::Duration const&)
[ros2_control_node-1] #0    Object "/opt/ros/humble/lib/libdiff_drive_controller.so", at 0x7f627c3baf89, in diff_drive_controller::DiffDriveController::update(rclcpp::Time const&, rclcpp::Duration const&)
[ros2_control_node-1] Segmentation fault (Signal sent by the kernel [(nil)])
[ERROR] [ros2_control_node-1]: process has died [pid 17555, exit code -11, cmd '/opt/ros/humble/lib/controller_manager/ros2_control_node --ros-args --params-file /tmp/launch_params__3imcxhq --params-file /home/jia/Workspace/ros2_ws/install/ros2_control_demo_bringup/share/ros2_control_demo_bringup/config/diffbot_controllers.yaml'].
christophfroehlich commented 1 year ago

I can reproduce your behavior.

Honestly I'm not sure about the lifecycle of the hardware interface, should it be activated/deactivated automatically if doing so with the controllers? @destogl?

The segmentation fault happens only with the diffbot example. I do not see a difference from rrbot with diffbot hardware from this repository, according to the stack trace (backward_ros ftw) the diff_drive_controller throws the error caused from a nullptr? I'll have a look if there is a difference in the lifecycle of the different controllers.

[ros2_control_node-1] Stack trace (most recent call last) in thread 16525:
[ros2_control_node-1] #6    Object "", at 0xffffffffffffffff, in 
[ros2_control_node-1] #5    Object "/usr/lib/x86_64-linux-gnu/libc.so.6", at 0x7f662cee09ff, in 
[ros2_control_node-1] #4    Object "/usr/lib/x86_64-linux-gnu/libc.so.6", at 0x7f662ce4eb42, in 
[ros2_control_node-1] #3    Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30", at 0x7f662d0de2b2, in 
[ros2_control_node-1] #2    Source "/workspaces/ros2_rolling_ws/src/ros2_control/controller_manager/src/ros2_control_node.cpp", line 81, in operator() [0x55f1261e40d0]
[ros2_control_node-1]          79:         // execute update loop
[ros2_control_node-1]          80:         cm->read(cm->now(), measured_period);
[ros2_control_node-1]       >  81:         cm->update(cm->now(), measured_period);
[ros2_control_node-1]          82:         cm->write(cm->now(), measured_period);
[ros2_control_node-1]          83: 
[ros2_control_node-1]          84:         // wait until we hit the end of the period
[ros2_control_node-1] #1    Source "/workspaces/ros2_rolling_ws/src/ros2_control/controller_manager/src/controller_manager.cpp", line 1741, in update [0x7f662d4f3957]
[ros2_control_node-1]        1739:       if (controller_go)
[ros2_control_node-1]        1740:       {
[ros2_control_node-1]       >1741:         auto controller_ret = loaded_controller.c->update(
[ros2_control_node-1]        1742:           time, (controller_update_rate != update_rate_ && controller_update_rate != 0)
[ros2_control_node-1]        1743:                   ? rclcpp::Duration::from_seconds(1.0 / controller_update_rate)
[ros2_control_node-1]        1744:                   : period);
[ros2_control_node-1] #0  | Source "/workspaces/ros2_rolling_ws/src/ros2_controllers/diff_drive_controller/src/diff_drive_controller.cpp", line 249, in set_value
[ros2_control_node-1]     |   247:   for (size_t index = 0; index < static_cast<size_t>(params_.wheels_per_side); ++index)
[ros2_control_node-1]     |   248:   {
[ros2_control_node-1]     | > 249:     registered_left_wheel_handles_[index].velocity.get().set_value(velocity_left);
[ros2_control_node-1]     |   250:     registered_right_wheel_handles_[index].velocity.get().set_value(velocity_right);
[ros2_control_node-1]     |   251:   }
[ros2_control_node-1]     | Source "/workspaces/ros2_rolling_ws/src/ros2_control/hardware_interface/include/hardware_interface/loaned_command_interface.hpp", line 66, in set_value
[ros2_control_node-1]     |    64:   const std::string & get_prefix_name() const { return command_interface_.get_prefix_name(); }
[ros2_control_node-1]     |    65: 
[ros2_control_node-1]     | >  66:   void set_value(double val) { command_interface_.set_value(val); }
[ros2_control_node-1]     |    67: 
[ros2_control_node-1]     |    68:   double get_value() const { return command_interface_.get_value(); }
[ros2_control_node-1]       Source "/workspaces/ros2_rolling_ws/src/ros2_control/hardware_interface/include/hardware_interface/handle.hpp", line 111, in update [0x7f662417a0f1]
[ros2_control_node-1]         109:   void set_value(double value)
[ros2_control_node-1]         110:   {
[ros2_control_node-1]       > 111:     THROW_ON_NULLPTR(this->value_ptr_);
[ros2_control_node-1]         112:     *this->value_ptr_ = value;
[ros2_control_node-1]         113:   }
[ros2_control_node-1]         114: };
[ros2_control_node-1] Segmentation fault (Signal sent by the kernel [(nil)])

I used this commit https://github.com/christophfroehlich/ros2_control_demos/tree/issue_239 and the current rolling branches of ros2_control and ros2_controllers

christophfroehlich commented 1 year ago

I tried to investigate the segmentation fault, but did not find the source. What I did: I extended the diff_drive_controller test, but this one does not throw any errors.

  state = controller_->get_node()->deactivate();
  ASSERT_EQ(state.id(), State::PRIMARY_STATE_INACTIVE);
  ASSERT_EQ(
    controller_->update(rclcpp::Time(0, 0, RCL_ROS_TIME), rclcpp::Duration::from_seconds(0.01)),
    controller_interface::return_type::OK);

  // activate again
  state = controller_->get_node()->activate();
  ASSERT_EQ(State::PRIMARY_STATE_ACTIVE, state.id());

  publish(linear, angular);
  // wait for msg is be published to the system
  ASSERT_TRUE(controller_->wait_for_twist(executor));
  ASSERT_EQ(
    controller_->update(rclcpp::Time(0, 0, RCL_ROS_TIME), rclcpp::Duration::from_seconds(0.01)),
    controller_interface::return_type::OK);

How can the set_value be called but the this-pointer be obsolete? The on_activate/on_deactivate methods of the diffbot_system do not change the pointers. Anyhow, the problem seems to be with the handles, and those are not managed by the hardware-interface of ros2_control_demo_example_2.

@destogl or @bmagyar: Any other suggestions how to narrow that down?

destogl commented 1 year ago

@christophfroehlich

We have here potentially multiple issues.

  1. DiffDriveController is not deactivated, but the hardware is deactivated. This causes nullptr – it seems that an interface or its value get invalidated in on_activate() method. (I would bet this is the issue) - in your test you are missing deactivation of controller.
  2. Automatic deactivation of controller is not implemented when hardware gets deactivated. Presently, this will not even be detected since the interface is still there is hardware is inactive.
  3. The lifecycle of hardware should work as expected. One should check if override of the methods is done correctly. Nevertheless, we might be missing something here… can you check the tests and see if they should be extended to detect if an implementation of SystemInterface::on_deactivate is actually called?
destogl commented 1 year ago

How can the set_value be called but the this-pointer be obsolete? The on_activate/on_deactivate methods of the diffbot_system do not change the pointers. Anyhow, the problem seems to be with the handles, and those are not managed by the hardware-interface of ros2_control_demo_example_2.

This sounds like quite a deep problem. Maybe something in the Resource Manager when we change interface flags is not right. But we are not moving them or invalidating them.

Do we have a test for this in Resource Manager?

Noel215 commented 1 year ago

Hi, I've created that PR for ros2_controllers that solves the issue for me, please check it out. :)

christophfroehlich commented 1 year ago
  1. DiffDriveController is not deactivated, but the hardware is deactivated. This causes nullptr – it seems that an interface or its value get invalidated in on_activate() method. (I would bet this is the issue) - in your test you are missing deactivation of controller.

@destogl to clarify this: the segmentation fault happens when the controller is deactivated while the hw component being still active (there are two topics mixed in this issue). I'll have a look on the PR of @Noel215, I fear I still don't understand the handle registration sufficiently.

  1. Automatic deactivation of controller is not implemented when hardware gets deactivated. Presently, this will not even be detected since the interface is still there is hardware is inactive.

ok

  1. The lifecycle of hardware should work as expected. One should check if override of the methods is done correctly. Nevertheless, we might be missing something here… can you check the tests and see if they should be extended to detect if an implementation of SystemInterface::on_deactivate is actually called?

We don't have any tests for the hardware components in this repo yet, see #46! do you have a hint to examples of tests we should add here?

christophfroehlich commented 1 year ago

I'll close this issue now, summary from the initial post of @xibeisiber :

lets discuss the other points in the linked issues