Multiple controller_manager spawn calls freeze a controller manager

mrjogo commented 7 years ago

Using https://github.com/davetcoleman/ros_control_boilerplate/tree/0.4.0/rrbot_control as an example, replace the ros_control_controller_manager node in https://github.com/davetcoleman/ros_control_boilerplate/blob/0.4.0/rrbot_control/launch/rrbot_hardware.launch#L21 with:

<node name="ros_control_controller_manager" pkg="controller_manager" type="controller_manager" respawn="false"
      output="screen" args="spawn joint_state_controller" />
<node name="ros_control_controller_manager" pkg="controller_manager" type="controller_manager" respawn="false"
      output="screen" args="spawn position_trajectory_controller" />

Build and run roslaunch rrbot_control rrbot_hardware.launch and the controller manager stops responding, either to service calls (ie, /rrbot/controller_manager/list_controllers) or Ctrl-C (if run outside roslaunch). The same thing happens if using spawner instead of controller_manager. I haven't tested it on a controller manager running outside of ros_control_boilerplate.

If it's intended that only one spawn call is allowed, I didn't see it in the documentation.

bmagyar commented 7 years ago

Your snippet clearly launches 2 controller managers so that's where the problem is. Starting up 2 controller_manager nodes which lead to a resource and name conflict. The reason it stops responding is probably because either something gets stuck or the first controller manager dies when the second one is started.

Take a look at this example where the spawner script is used to set up controllers on an already running controller_manager. If you intend to start things up separately or in a parametrised way, this is the way to go.

spawner is only a script that calls services load_controller, switch_controller. It has the same effect as you'd get by executing the same commands from the command line, see here.

Dave's boilerplate functionalities allow a quick setup by using the controller manager + spawner at the same time but if you need to launch things in a modular way I'd recommend looking at any of the PAL robots for inspiration.

mathias-luedtke commented 7 years ago

Your snippet clearly launches 2 controller managers so that's where the problem is.

I think the problem is that both use the same node name ros_control_controller_manager. So the first might get killed in the middle of the service call. Please try it with distinct names.

You can spawn multiple controllers at once, too:

<node name="ros_control_controller_manager" pkg="controller_manager" type="controller_manager" respawn="false"
      output="screen" args="spawn joint_state_controller position_trajectory_controller" />

(just realized that is this is the default for the boilerplate template)

spawner [...] It has the same effect as you'd get by executing the same commands from the command line

The spawner script is different as it will (try to) stop the controller if it stops (ctrl+c or kill). This need proper set-up to avoid problems. I tend to avoid it.

mathias-luedtke commented 7 years ago

Please try it with distinct names.

I have tested it myself, and I get the same bevahior, looks like a bug.

mrjogo commented 7 years ago

@bmagyar

Your snippet clearly launches 2 controller managers so that's where the problem is.

The controller_manager script is just an interface to an already running controller_manager (confusing name). It's fairly similar to the spawner script, which also causes the issue if two are started.

@ipa-mdl Yes, I still see the problem with different node names.

Conceptually, I think multiple spawn nodes should be able to run together (I was using it to start some controllers using --stopped and others started), so it's either a bug or the documentation should be clarified to indicate only one spawn node at a time should be run.

mathias-luedtke commented 7 years ago

Conceptually, I think multiple spawn nodes should be able to run together

Yes that should work! I have tested it with our Care-O-bot simulation on indigo without problems. Not sure if this is a regression bug, a bug with the default gazebo plugin or the boilerplate template..

bmagyar commented 7 years ago

@mrjogo

The controller_manager script is just an interface to an already running controller_manager (confusing name). It's fairly similar to the spawner script, which also causes the issue if two are started.

Yeah, sorry for rushing to conclusion, my eyes skipped the fact that it was the python script.

This is actually not that hard to test. I'd try launching this by hand and testing the idea against it.

bmagyar commented 6 years ago

Any update on this?

jacknlliu commented 6 years ago

Same behavior on Ubuntu 16.04, ROS kinetic. I think this is a regression bug.

And I find that after combined the controller_manager nodes to only one, rosservice call /controller_manager/list_controllers works, but if we use rosservice call /controller_manager/switch_controllers to stop controllers, then it will have no responds and rosservice call /controller_manager/list_controllers also has no responds again.

mathias-luedtke commented 6 years ago

TL;DR: This is no bug in the controller_manager, but an issue with the boilerplate template.

I went through the code of the ros_control_boilerplate and have found out that it is running its control loop with a ROS timer in conjunction with an AsyncSpinner with two threads.

The service handlers in controller_manager are protected by locks. So if two switch requests come in at the same time, the first one will be run, the second one will block. In case of the boilerplate template these two service calls block both available spinner threads, so the control loop cannot be run. This results in a deadlock because the first service handler waits for the control loop to process the switching.

We could improve the situation a little bit by resolving the deadlock with a timed lock (so it will recover after some time), but the boilerplate controller loop will freeze for a certain time. IMHO it would be best to migrate the boilerplate code to a sleep-based loop.

mathias-luedtke commented 6 years ago

I have proposed a fix: https://github.com/davetcoleman/ros_control_boilerplate/pull/20

alextoind commented 6 years ago

I think I'm facing a similar problem without ros_control_boilerplate installed: if I launch several rosrun controller_manager controller_manager spaw ... commands almost simultaneously, they stuck while spawning controllers, e.g.

Terminal 1:

rosrun controller_manager controller_manager spawn d1_joint_state_controller d1_c1 d1_c2
Loaded d1_joint_state_controller

Terminal 2:

rosrun controller_manager controller_manager spawn d2_joint_state_controller d2_c1 d2_c2
Loaded d2_joint_state_controller
Loaded d2_c1
Loaded d2_c2

Terminal 3: control node started previously (the same if it is started after)

...
[ INFO] [1524818643.691951727]: Is waiting for [controller_manager] to startup...
[ INFO] [1524818643.693721761]: [controller_manager] is ready!
[ INFO] [1524818643.697370788]: Is waiting for controller [d1_joint_state_controller] to startup...
[ INFO] [1524818643.840676782]: Controller [d1_joint_state_controller] is loaded!
[ INFO] [1524818643.840705571]: Is waiting for controller [d1_c1] to startup...

However, if I use load ... commands in the exactly same manner, it works as expected (apart form the needs to start controllers later). And of course it works if I spawn all the controllers in one shot.

The control node calls controller_manager.getControllerByName(c_name) and action_client.waitForServer() respectively to wait for controllers and action server startup (I'm using default joint trajectory controllers at the moment).

I don't know whether it is correct to add these waits or not, but I've added them in the past to be sure that everything was started correctly. Now that I need to add several spawners they seem to mess up. Indeed, if I remove both calls everything start correctly even with the spawn ... commands.

It seems a deadlock problem but it is difficult to figure out exactly where.

Any thoughts? Is it the right behavior?

miguelprada commented 6 years ago

@alextoind, which version of ros_control_boilerplate are you using?

It is possible that davetcoleman/ros_control_boilerplate#20 solves your problems? That PR has been very recently merged, so the fix is not yet available in the released packages.

bmagyar commented 6 years ago

Alessandro, please add the usual info too: op system, ROS distro, package version

On Apr 27, 2018 10:29, "Alessandro Tondo" notifications@github.com wrote:

I think I'm facing a similar problem without ros_control_boilerplate installed: if I launch several rosrun controller_manager controller_manager spaw ... commands almost simultaneously, they stuck while spawning controllers, e.g.

Terminal 1:

rosrun controller_manager controller_manager spawn d1_joint_state_controller d1_c1 d1_c2 Loaded d1_joint_state_controller

Terminal 2:

rosrun controller_manager controller_manager spawn d2_joint_state_controller d2_c1 d2_c2 Loaded d2_joint_state_controller Loaded d2_c1 Loaded d2_c2

Terminal 3: control node started previously (the same if it is started after)

... [ INFO] [1524818643.691951727]: Is waiting for [controller_manager] to startup... [ INFO] [1524818643.693721761]: [controller_manager] is ready! [ INFO] [1524818643.697370788]: Is waiting for controller [d1_joint_state_controller] to startup... [ INFO] [1524818643.840676782]: Controller [d1_joint_state_controller] is loaded! [ INFO] [1524818643.840705571]: Is waiting for controller [d1_c1] to startup...

However, if I use load ... commands in the exactly same manner, it works as expected (apart form the needs to start controllers later). And of course it works if I spawn all the controllers in one shot.

The control node calls controller_manager.getControllerByName(c_name) and action_client.waitForServer() respectively to wait for controllers and action server startup (I'm using default joint trajectory controllers at the moment).

I don't know whether it is correct to add these waits or not, but I've added them in the past to be sure that everything was started correctly. Now that I need to add several spawners they seem to mess up. Indeed, if I remove both calls everything start correctly even with the spawn ... commands.

It seems a deadlock problem but it is difficult to figure out exactly where.

Any thoughts? Is it the right behavior?

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/ros-controls/ros_control/issues/265#issuecomment-384917739, or mute the thread https://github.com/notifications/unsubscribe-auth/ADXH4S7-xpPKjHKy2c2pc7NAVQaE9nkUks5tsuT3gaJpZM4ML4IW .

alextoind commented 6 years ago

@miguelprada as said at the beginning I have no ros_control_boilerplate installed in my system. Please, let me know whether it is better to open a new issue or not.

@bmagyar I'm sorry about that, here you are:

Ubuntu Xenial 16.04 (_kernel 4.13.0-39-generic x8664)
ROS Kinetic
ros_control 0.13.2 (0.13.2-0xenial-20180416-120230-0800) and all other packages up to date (everything "stock" apart from my control node)

Thank you very much

mathias-luedtke commented 6 years ago

everything "stock" apart from my control node

Is it based on something like ros_control_boilerplate (which is meant as a template)? Do you run the update loop with a ROS Timer? Any chance of deadlocks in your code?

miguelprada commented 6 years ago

@miguelprada as said at the beginning I have no ros_control_boilerplate installed in my system. Please, let me know whether it is better to open a new issue or not.

Oops I read your post too fast and thought it was a with instead of a without :man_facepalming:

In any case, the problem might be the same one that was fixed in davetcoleman/ros_control_boilerplate#20, as @ipa-mdl seems to suggest when asking whether your control node is based on ros_control_boilerplate. Here's his explanation of what was going on.

alextoind commented 6 years ago

@ipa-mdl these are my first-level dependencies, but I don't think there is nothing like ros_control_boilerplate:

roscpp
actionlib
combined_robot_hw
hardware_interface
joint_limits_interface
transmission_interface
urdf
control_msgs
control_toolbox
controller_manager
message_runtime
std_msgs
std_srvs

And I'm quite sure there aren't deadlocks in my code: it just initializes things up and relies on ROS Action Clients. However I do run the update loop of my control node through a ROS Timer. Is that the problem?

// inside constructor
control_timer_ = node_handle_.createWallTimer(control_duration_, &robotControl::controlCallback, this);
...

void robotControl::controlCallback(const ros::WallTimerEvent &timer_event) {
  update(timer_event.current_real, timer_event.current_real - timer_event.last_real);
}

void robotControl::update(const ros::WallTime& time, const ros::WallDuration& period) {
  // read the state from the hardware interface

  // update the controllers
  controller_manager_.update(ros::Time(time.toSec()), ros::Duration(period.toSec()));

  // write the commands to the hardware interface
}

@miguelprada I had already read that post, but since I use an AsyncSpinner initialized with just one thread (and Actions are initialized with spin_thread forced to false), I think that it can be something like that, but not exactly the same problem.

With my setup I should have only one thread accessing those resources. Isn't it?

Thank you very much for your time

mathias-luedtke commented 6 years ago

I had already read that post, but since I use an AsyncSpinner initialized with just one thread (and Actions are initialized with spin_thread forced to false), I think that it can be something like that, but not exactly the same problem.

Not the same problem, but the same effect. The service callback of the controller_manager passes data to the real-time system (run with update) and blocks until the request was processed by the loop. So the update must be independent from the ROS callback system.

Since you are running an AsyncSpinner, you can run the update in the main thread. Just like I did it in https://github.com/davetcoleman/ros_control_boilerplate/pull/20/commits/7199b3cb84534daf9887aa8b8a1f17f48e2e11b7.

alextoind commented 6 years ago

Thank you @ipa-mdl!

And what about a dedicated Callback Queue and AsyncSpinner pair to handle the controller_manager update loop? Would it be ok, or it may suffer the same deadlock problem?

An improved doc as suggested in #330 would be great for people like me!

mathias-luedtke commented 6 years ago

And what about a dedicated Callback Queue and AsyncSpinner pair to handle the controller_manager update loop? Would it be ok, or it may suffer the same deadlock problem?

This should work. But I don't see any benefit, especially if the main ends with ros::waitForShutdown().

alextoind commented 6 years ago

Just to be sure I had got it correctly ;)

Thank you very much!

ros-controls / ros_control

Multiple controller_manager spawn calls freeze a controller manager #265