ros / ros_comm

ROS communications-related packages, including core client libraries (roscpp, rospy, roslisp) and graph introspection tools (rostopic, rosnode, rosservice, rosparam).
http://wiki.ros.org/ros_comm
762 stars 912 forks source link

roslaunch multiple master race condition #1831

Open realtime-neil opened 5 years ago

realtime-neil commented 5 years ago

Here's a thing I can reproduce with some regularity (names changed to protect the guilty):

user@host:~$ for sysuser in foo_app_one foo_app_two foo_app_three; do
> echo
> echo "${user}"
> pstree --arguments --long --show-parents --ns-changes --uid-changes "${user}"
> done

foo_app_one
sh -c . /opt/ros/kinetic/setup.sh && roslaunch -v foo_app_one main.launch
  `-roslaunch /opt/ros/kinetic/bin/roslaunch -v foo_app_one main.launch
      |-rosmaster /opt/ros/kinetic/bin/rosmaster --core -p 11311 -w 3 __log:=/tmp/foo_app_one/log/UUID_ONE/master.log
      |   `-4*[{rosmaster}]
      |-rosout __name:=rosout __log:=/tmp/foo_app_one/log/UUID_ONE/rosout-1.log
      |   `-2*[{rosout}]
      |-foo_app_one __name:=foo_app_one __log:=/tmp/foo_app_one/log/UUID_ONE/foo_app_one-2.log
      |   `-51*[{foo_app_one}]
      `-2*[{roslaunch}]

foo_app_two
sh -c . /opt/ros/kinetic/setup.sh && roslaunch -v foo_app_two main.launch
  `-roslaunch /opt/ros/kinetic/bin/roslaunch -v foo_app_two main.launch
      |-foo_app_two /opt/ros/kinetic/lib/foo_app_two/foo_app_two __name:=foo_app_two __log:=/tmp/foo_app_two/log/UUID_TWO/foo_app_two-1.log
      |   `-node /opt/ros/kinetic/share/foo_app_two/server.js __name:=foo_app_two __log:=/tmp/foo_app_two/log/UUID_TWO/foo_app_two-1.log
      |       `-6*[{node}]
      `-2*[{roslaunch}]

foo_app_three
sh -c . /opt/ros/kinetic/setup.sh && roslaunch -v foo_app_three main.launch
  `-roslaunch /opt/ros/kinetic/bin/roslaunch -v foo_app_three main.launch
      |-rosmaster /opt/ros/kinetic/bin/rosmaster --core -p 11311 -w 3 __log:=/tmp/foo_app_three/log/UUID_TWO/master.log
      |   `-3*[{rosmaster}]
      |-rosout __name:=rosout __log:=/tmp/foo_app_three/log/UUID_TWO/rosout-1.log
      |   `-3*[{rosout}]
      |-foo_app_three __name:=foo_app_three __log:=/tmp/foo_app_three/log/UUID_TWO/foo_app_three-2.log
      |   `-14*[{foo_app_three}]
      `-2*[{roslaunch}]

Note how there are two rosmaster processes running. Note also how the first roslaunch is tied to rosmaster it started, while the second and third roslaunch are tied to the rosmaster the third roslaunch created.

This behavior would directly contradict the following statement:

A roslaunch will automatically start roscore if it detects that it is not already running (unless the --wait argument is supplied).

--- https://wiki.ros.org/roslaunch#roscore

Clearly, there is some gap between the instant roslaunch checks for an extant rosmaster and the instant roslaunch starts a rosmaster. Moreover, this gap is not being protected by host-wide mutex.

mikepurvis commented 5 years ago

Obviously this behaviour isn't ideal, especially the raciness of it. However, I'm not sure there's much that can really be done at this point, particularly given that working around the issue is so trivial.

If you know upfront that you're invoking multiple roslaunch instances (eg, from multiple systemd units or whatever), it's on you to add the --wait flag to all but one of them, or add it to all of them, and use your external process management system to run a standalone roscore.

Perhaps the docs should just be updated to make this expected usage more clear?

realtime-neil commented 5 years ago

@mikepurvis , I think we agree. I wasn't expecting the introduction of some host-wide locking mechanism, but I do think the documentation should sport an extremely visible caveat that warns users off the mistaken notion that a roslaunched rosmaster is a guaranteed singleton.

realtime-neil commented 5 years ago

@mikepurvis , how does one effect a change to the linked documentation? Should I register to edit the wiki?

mikepurvis commented 5 years ago

Yup, go for it! Unfortunately anonymous edits had to be disabled many moons ago on account of the spam.

realtime-neil commented 5 years ago

I've updated the documentation with a note about the roscore singleton (or lackthereof) and a link back to this ticket.

gavanderhoorn commented 4 years ago

Diff: here.