ros-controls / ros_control

Generic and simple controls framework for ROS
http://wiki.ros.org/ros_control
BSD 3-Clause "New" or "Revised" License
475 stars 307 forks source link

Something is wrong with the controller switching (tests) #418

Open mathias-luedtke opened 4 years ago

mathias-luedtke commented 4 years ago

I noticed some spurious failures in the cron CI jobs of industrial_ci.

One test sometimes fails with

Full test results for 'build/controller_manager_tests/test_results/controller_manager_tests/rosunit-controller_manager_scripts_test.xml'

-------------------------------------------------
<?xml version="1.0" encoding="utf-8"?>
<testsuite errors="0" failures="1" name="unittest.suite.TestSuite" tests="1" time="2.807"><testcase classname="__main__.TestUtils" name="test_scripts" time="2.8065"><failure type="AssertionError">'my_controller1 - hardware_interface::EffortJointInterface ( stopped )\n' != 'my_controller1 - hardware_interface::EffortJointInterface ( running )\n'
  File "/usr/lib/python2.7/unittest/case.py", line 329, in run
    testMethod()
  File "/root/target_ws/src/ros_control/controller_manager_tests/test/controller_manager_scripts.py", line 59, in test_scripts
    self.assertEqual(run_cm('list'), myc1_running)
  File "/usr/lib/python2.7/unittest/case.py", line 513, in assertEqual
    assertion_func(first, second, msg=msg)
  File "/usr/lib/python2.7/unittest/case.py", line 506, in _baseAssertEqual
    raise self.failureException(msg)
</failure></testcase><system-out>&lt;![CDATA[
]]&gt;</system-out><system-err>&lt;![CDATA[
]]&gt;</system-err></testsuite>

In the logs further up I found

[ERROR] [1579997360.906338590]: Could not start controller with name my_controller3 because no controller with this name exists
[ERROR] [1579997373.912803741]: Controller manager: Cannot reload controller libraries because there are still 1 controllers running
[ERROR] [1579997373.914896691]: A controller named 'my_controller1' was already loaded inside the controller manager

Traceback (most recent call last):
  File "/root/target_ws/install/lib/controller_manager/spawner", line 207, in <module>
    if __name__ == '__main__': main()
  File "/root/target_ws/install/lib/controller_manager/spawner", line 199, in main
    resp = switch_controller(loaded, [], 2)
  File "/opt/ros/kinetic/lib/python2.7/dist-packages/rospy/impl/tcpros_service.py", line 435, in __call__
    return self.call(*args, **kwds)
  File "/opt/ros/kinetic/lib/python2.7/dist-packages/rospy/impl/tcpros_service.py", line 525, in call
    raise ServiceException("transport error completing service call: %s"%(str(e)))
rospy.service.ServiceException: transport error completing service call: unable to receive data from sender, check sender's logs for details
[WARN] [1579997381.977107]: Controller Spawner error while taking down controllers: service [/controller_manager/switch_controller] unavailable

I guess that this occurs because we run two test nodes at the same time: https://github.com/ros-controls/ros_control/blob/8d901e019a3bc70333afd69574d887afc244ad2c/controller_manager_tests/test/controller_manager_scripts.test#L14-L15

Furthermore, it looks like the (parallel) switching breaks the controller manager somehow. It is either a bug in our test node or in the switching code or both.

380 might be related.

I will keep monitoring this.

bmagyar commented 4 years ago

Thanks for the heads up! I'd definitely suggest creating a new .test file with a single node and test against that too.

jonbinney commented 4 years ago

Is it possible something broke ABI compatibility in ros-control recently? We just did updated our apt packages from the public ROS apt repo on melodic, and controller managers in our code stopped working. After recompile, they worked again. Here's the version we're on now:

iron@m2a:~$ apt-cache policy ros-melodic-controller-manager
ros-melodic-controller-manager:
  Installed: 0.16.0-1bionic.20200130.022304
  Candidate: 0.16.0-1bionic.20200130.022304
  Version table:
 *** 0.16.0-1bionic.20200130.022304 500
        500 http://packages.ros.org/ros/ubuntu bionic/main amd64 Packages
        100 /var/lib/dpkg/status

(timing makes me think it may be the same problem as in this issue)

matthew-reynolds commented 4 years ago

Is it possible something broke ABI compatibility in ros-control recently?

395, almost definitely. It added new member functions and new member variables to Actuator, Joint, and Transmission handles. Looking back, that probably should have targetted Noetic...

bmagyar commented 4 years ago

Indeed, those changes introduced ABI breakage. While the option of targeting Noetic for it was there, we opted for Melodic to deploy this functionality earlier to upstream as well as Noetic is still somewhat unclear and this setup allows us to incrementally improve upon the current solution with another set of ABI-breaking changes when Noetic comes.

@jonbinney Sorry for the inconvenience this caused you, I've just put out an announcement on discourse to let others know: https://discourse.ros.org/t/ros-control-abi-breakage-in-melodic-2020-02/12681

jonbinney commented 4 years ago

No worries @bmagyar - I'm happy to have the improvements in melodic and the recompilation isn't a problem.

seebq commented 4 years ago

Oh man, this bit us, because (gulp) we had a forked version of diff_drive_controller. I know we shouldn't have, but are doing some custom stuff that shouldn't be upstream. We build our own packages and had to version bump to rebuild and deploy to field units. Doh.

Thanks for the improvements, as always, onward and upward!

bmagyar commented 4 years ago

How about subclassing? Others do a lot of that with the joint_trajectory_controller. Feel free to propose which functions should be virtual.

On Fri, 13 Mar 2020, 17:21 Charles Brian Quinn, notifications@github.com wrote:

Oh man, this bit us, because (gulp) we had a forked version of diff_drive_controller. I know we shouldn't have, but are doing some custom stuff that shouldn't be upstream. We build our own packages and had to version bump to rebuild and deploy to field units. Doh.

Thanks for the improvements, as always, onward and upward!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ros-controls/ros_control/issues/418#issuecomment-598830381, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA24PYOKME63FKZ5AK2X6D3RHJTSJANCNFSM4KLUDZLA .