Joint position controller commands not getting through (permanently zero)

pallgeuer commented 7 years ago

I am running a Gazebo simulation of a robot with 20 joints, each of which is being controlled by an effort_controllers::JointPositionController. Under ROS Indigo (Gazebo 2.x) everything works fine and I've been using it for a while. Under ROS Kinetic (Gazebo 7.x) however, very often when I launch the simulation, one or more joints permanently have a setpoint of zero, and those joints therefore don't move, even though valid commands are being published.

So for example, if on a particular run of the simulation the left_hip_yaw joint is the problem, that means that my robot control node is publishing on:

/my_robot/left_hip_yaw_position_controller/command

And I can confirm this with rostopic echo, that the data is really being published, and is non-zero. When I do however:

rostopic echo /nimbro_op/left_hip_yaw_position_controller/state

...I permanently see set_point: 0.0. I checked, and was able to definitively see that the reason for this is because despite this line:

sub_command_ = n.subscribe<std_msgs::Float64>("command", 1, &JointPositionController::setCommandCB, this);

in JointPositionController::init(), the callback JointPositionController::setCommandCB is never called. Using rosnode info and rostopic info says that the node is correctly subscribed though, and the ROS_DEBUG messages also say:

[DEBUG][/gazebo->Subscription::pendingConnectionDone]: Connecting via tcpros to topic [/nimbro_op/left_hip_yaw_position_controller/command] at host [nelson.local:58643] [DEBUG][/gazebo->TransportTCP::connect]: Resolved publisher host [nelson.local] to [127.0.1.1] for socket [72] [DEBUG][/gazebo->TransportTCP::initializeSocket]: Adding tcp socket [72] to pollset [DEBUG][/gazebo->TransportTCP::connect]: Async connect() in progress to [nelson.local:58643] on socket [72] [DEBUG][/gazebo->Subscription::pendingConnectionDone]: Connected to publisher of topic [/nimbro_op/left_hip_yaw_position_controller/command] at [nelson.local:58643]

The only indication, other than that the callback isn't being called, that something is wrong is the ROS_DEBUG message (once only):

[DEBUG][/gazebo->SubscriptionQueue::push]: Incoming queue was full for topic "/nimbro_op/left_hip_yaw_position_controller/command". Discarded oldest message (current queue size [0])

which indicates that the messages are never being processed at all, because otherwise the second time the buffer of size 1 is full after being emptied again once, the ROS_DEBUG message would appear again.

But on every launch it's other joint(s) having the problem, and even if one is not working, the other 19 JointPositionControllers are working. Also, if a JointPositionController is working after being launched, then it will continue to do so for as long as Gazebo is running, and if it is not, then it never suddenly starts working later.

Something is not working right, but what?

bmagyar commented 7 years ago

Hiya,

Quite a puzzle! Let's start at base 1, do you have a public simulation I could reproduce this with?

Random architect question: Any particular reason why you use one controller per joint instead of group controllers?

pallgeuer commented 7 years ago

Kind of - this is the simulation I am experiencing the problems with:

https://github.com/AIS-Bonn/humanoid_op_ros/tree/master/src/nimbro/hardware/nimbro_op_gazebo https://github.com/AIS-Bonn/humanoid_op_ros/tree/master/src/nimbro/hardware/nimbro_op_model

I launch nimbro_op.launch in nimbro_op_gazebo. You will need to set an environment variable like:

export NIMBRO_ROBOT_VARIANT=nimbro_op_hull

The repository is released for indigo (where everything works without a problem), but internally I am currently trying to move to kinetic, and that's where I am getting problems. To get the whole robot control node (the node that publishes the joint commands) running on your end, would take more effort than it's probably worth. I think some basic test node that just publishes (non-zero) joint commands for the robot's 20 joints would suffice to see the problem.

Can you explain more about the group controllers? I'm not sure what exactly that means.

Thanks

dcconner commented 7 years ago

I'm curious if you had any resolution to this? I'm trying to move to Kinetic and having issues with controllers with Gazebo. In my case, JointPositionControllers seem to work, but not a JointTrajectoryController

pallgeuer commented 7 years ago

I dug very deep for a while still, but could not come to a resolution. We have completely postponed moving to Kinetic in our project because of this issue (until there is a resolution) so it's kind of a big deal...

mathias-luedtke commented 7 years ago

Have you tried it with jade / gazebo 5? Have you tried to reload the zeroed controller?

Can you explain more about the group controllers? I'm not sure what exactly that means.

The group controllers forward multiple joints from a std_msgs/Float64MultiArray topic. Exampe: https://github.com/ipa320/cob_robots/blob/indigo_dev/cob_hardware_config/cob4-2/config/torso_controller.yaml#L29 (just ignore the required_drive_mode setting, these are needed for ros_canopen only)

pallgeuer commented 7 years ago

Trying to get our project running under jade / gazebo 5 is some effort, and whatever the result is, is very unlikely to yield any useful information in tracking and solving the bug, so no, I haven't tried it.

Reloading controller(s) does not help, as randomly some controllers will still be zeroed. That is, reloading a controller, whether zeroed or not, has a (seemingly) random chance of resulting in the reloaded controller being zeroed. So reloading can 'fix' the problem, create one that wasn't there before, or not change anything.

ros-controls / ros_controllers

Joint position controller commands not getting through (permanently zero) #248