osrf / subt

This repostory contains software for the virtual track of the DARPA SubT Challenge. Within this repository you will find Gazebo simulation assets, ROS interfaces, support scripts and plugins, and documentation needed to compete in the SubT Virtual Challenge.
Other
309 stars 97 forks source link

Problems stopping all processes associated with ignition #95

Closed osrf-migration closed 5 years ago

osrf-migration commented 5 years ago

Original report (archived issue) by Sarah Kitchen (Bitbucket: snkitche).

The original report had attachments: hangingprocesslogs.tar.gz


Trying to run a several configurations, different .ign files, etc. I’ve intermittently had trouble stopping processes. I Ctrl+C to exit, but find I have hanging processes when I htop, or I ps -X. Sometimes if I try to killall via pid, “no process found” is returned. Sometimes killall rosmaster has worked, but not always.

osrf-migration commented 5 years ago

Original comment by Alfredo Bencomo (Bitbucket: bencomo).


osrf-migration commented 5 years ago

Original comment by Sarah Kitchen (Bitbucket: snkitche).


Upgrading this to blocker. This is not always an issue, but is frequently an issue when launching multiple agents. Here is a screenshot of htop after killing rosmaster, closing and reopening terminal. No ignition or ros processes were listed with ps -X when I took this.

osrf-migration commented 5 years ago

Original comment by Alfredo Bencomo (Bitbucket: bencomo).


Can you post the exact command you are using to launch these multiple agents?

osrf-migration commented 5 years ago

Original comment by Sarah Kitchen (Bitbucket: snkitche).


Well, this has happened with many different launch commands. I don’t recall which one I used before getting that screenshot. Here is what I’ve been trying to run today (with the same problem):

ign launch -v 4 virtual_stix.ign robotName1:=X3 robotConfig1:=X1_SENSOR_CONFIG_1 robotName2:=X2 robotConfig2:=X1_SENSOR_CONFIG_1 robotName3:=X1 robotConfig3:=X1_SENSOR_CONFIG_4

osrf-migration commented 5 years ago

Original comment by Alfredo Bencomo (Bitbucket: bencomo).


Sarah Kitchen (snkitche) We will look into this. In the meantime, can you use the competition.ign instead of virtual_stix.ign ?

ign launch -v 4 competition.ign robotName1:=X3 robotConfig1:=X1_SENSOR_CONFIG_1 robotName2:=X2 robotConfig2:=X1_SENSOR_CONFIG_1 robotName3:=X1 robotConfig3:=X1_SENSOR_CONFIG_4

osrf-migration commented 5 years ago

Original comment by Sarah Kitchen (Bitbucket: snkitche).


This happens just as much with competition.ign. At first, I thought it was due to the logging and/or dynamic loading, but since it happens with virtual_stix.ign as well, this seems to not be the issue. I’m having this issue today with the command

ign launch -v 4 competition.ign robotName1:=X1 robotConfig1:=X1_SENSOR_CONFIG_4

osrf-migration commented 5 years ago

Original comment by Nate Koenig (Bitbucket: Nathan Koenig).


Can you post your ign-gazebo version using dpkg -l | grep ignition?

osrf-migration commented 5 years ago

Original comment by Sarah Kitchen (Bitbucket: snkitche).


ii ignition-blueprint 1.0.0-1~bionic

ii ignition-gazebo2 2.2.0-1~bionic

ii ignition-tools:amd64 0.2.0-1~bionic

Let me know if you want to see any of the libignition versions.

osrf-migration commented 5 years ago

Original comment by Nate Koenig (Bitbucket: Nathan Koenig).


Those look good. How about ros-melodic-ros1-ign-bridge?

osrf-migration commented 5 years ago

Original comment by Derek Knowles (Bitbucket: dknowles-ssci).


This also frequently happens to me when I run competition.ign .

ignition-blueprint          1.0.0-1~bionic
ignition-gazebo2            2.2.0-1~bionic
ignition-tools:amd64        0.2.0-1~bionic
ros-melodic-ros1-ign-bridge 0.3.1-1bionic
osrf-migration commented 5 years ago

Original comment by Alfredo Bencomo (Bitbucket: bencomo).


Sarah, Derek,

The same was happening to me last week, but I cannot reproduce it anymore after rebuilding my workspace. Can you run the commands below and try it again?

cd ~/.ignition/fuel/fuel.ignitionrobotics.org/openrobotics/models/

rm -rfv *

cd ~/subt_ws/src/tunnel_circuit

hg pull && hg up

cd ~/subt_ws

catkin_make install

ign launch -v 4 competition.ign robotName1:=X1 robotConfig1:=X1_SENSOR_CONFIG_2

NOTE: The GUI will present you with empty panels for a few minutes until the models are downloaded.

osrf-migration commented 5 years ago

Original comment by Sarah Kitchen (Bitbucket: snkitche).


Nate:

ii ros-melodic-ros1-ign-bridge 0.3.1-1bionic

Alfredo, I’ve followed your update instructions. It will take a couple runs before I can see if I’m still having an issue. Will update this comment when I can tell.

The problem persists. New screenshot attached with a sorted tree. Before trying to kill anything, these processes are under a bash (which is under systemd).

Launch command:

ign launch -v 4 competition.ign robotName1:=X3 robotConfig1:=X1_SENSOR_CONFIG_1 robotName2:=X2 robotConfig2:=X1_SENSOR_CONFIG_1 robotName3:=X1 robotConfig3:=X1_SENSOR_CONFIG_4

osrf-migration commented 5 years ago

Original comment by Alfredo Bencomo (Bitbucket: bencomo).


Thank you, Sarah for testing this. Can you also please attach the .log file created in your home directory when this occurred?

osrf-migration commented 5 years ago

Original comment by Sarah Kitchen (Bitbucket: snkitche).


I have attached the directory from /home/snkitche/.ros/log that I believe corresponds to the above info.

The log files created by competition.ign in /home/snkitche (starting with subt_tunnel_qual) appear to be empty (0 bytes). I have also saved off some console output from a different set of runs, but am not sure entirely what is in there. The issue of ImageDisplay causing a seg fault happened again in that set.

osrf-migration commented 5 years ago

Original comment by Hector Escobar (Bitbucket: hector_escobar).


I get the same problem and if I use “top” to view the processes, usually there are several parameter_bridge not stopped. I’ve been using killall parameter_bridge to kill them and then all seem to die. Sometimes I also have to kill ign.

osrf-migration commented 5 years ago

Original comment by Alfredo Bencomo (Bitbucket: bencomo).


osrf-migration commented 5 years ago

Original comment by Michael Carroll (Bitbucket: Michael Carroll).


I have managed to reproduce this behavior. Will work on a fix.

osrf-migration commented 5 years ago

Original comment by Derek Knowles (Bitbucket: dknowles-ssci).


Has anybody found a good temporary solution until this issue is fixed? I’m currently having to run this after nearly every time I shutdown competition.ign

killall rosmaster roslaunch parameter_bridge ukf_localization_node roll_pitch_yawrate_thrust_controller_node
kill -9 $(pgrep ign)
osrf-migration commented 5 years ago

Original comment by Neil Johnson (Bitbucket: realdealneil1980).


I generally have the same problem. I installed the catkin_ws version of ignition gazebo on Wednesday of this week. The simulator is working in general, but I have to kill processes like Derek mentions above. Sometimes even that doesn’t seem to be enough, and I have to reboot the computer to get the vehicles to launch again. The problem seems worst when I close the ignition window early on…if I’ve run the simulator for a few minutes, it sometimes doesn’t happen.

osrf-migration commented 5 years ago

Original comment by Michael Carroll (Bitbucket: Michael Carroll).


Currently, closing the GUI window does not terminate the rest of the simulation. To stop the simulation, use ctrl-c in the terminal where ign launch was started.

osrf-migration commented 5 years ago

Original comment by Michael Carroll (Bitbucket: Michael Carroll).


At least one deadlock issue was introduced at shutdown via https://osrf-migration.github.io/subt-gh-pages/#!/osrf/subt/pull-requests/184 and was resolved via https://osrf-migration.github.io/subt-gh-pages/#!/osrf/subt/pull-requests/194/fixing-a-deadlock-i-introduced-in-the-base/diff

At this point, I can’t seem to reproduce this behavior locally, but since the issue was opened before #184 was merged, I can’t be confident that #184 was the only factor. If you continue to see issues after #194, please let me know, so that I can work on constructing a case that consistently reproduces the bug.

osrf-migration commented 5 years ago

Original comment by Alfredo Bencomo (Bitbucket: bencomo).


Michael’s changes to fix the problem have been merged so update your local repo to grab them.

cd ~/subt_ws/src/tunnel_circuit
hg pull && hg up
cd ~/subt_ws catkin_make install

# To test:
ign launch -v 4 competition.ign robotName1:=X1 robotConfig1:=X1_SENSOR_CONFIG_2
osrf-migration commented 5 years ago

Original comment by Derek Knowles (Bitbucket: dknowles-ssci).


Thanks for your work on this issue Michael Carroll (Michael Carroll)

I updated the subt repository as suggested and deleted the build/ devel/ install/ folders for good measure before running catkin_make install.

I still occasionally have processes labeled /usr/bin/ruby /usr/bin/ign launch -v 4 competition.ign robotName1:=X4 robotConfig1:=X4_SENSOR_CONFIG_2 running even after I ctrl+C and close the terminal where that command was run. Let me know if a log file would help.

ign processes are the only offenders I’ve noticed since your update. I haven’t seen any rosmaster roslaunch parameter_bridge ukf_localization_node roll_pitch_yawrate_thrust_controller_node processes still running.

osrf-migration commented 5 years ago

Original comment by Michael Carroll (Bitbucket: Michael Carroll).


Yes, in that case, logs would be very helpful. Full verbosity (which you already have). Feel free to make a gist and post them there so that we don’t flood the thread here.

osrf-migration commented 5 years ago

Original comment by Derek Knowles (Bitbucket: dknowles-ssci).


This one actually had all those processes I mentioned except ign. I copied the contents from ~/.ros/log/latest/ here. Are there ignition specific logs I should add?

https://gist.github.com/betaBison/28e9ee9cd7984696537484b85a68f636

osrf-migration commented 5 years ago

Original comment by Alfredo Bencomo (Bitbucket: bencomo).


Derek,

Can you also try with tunnel_circuit_practice.ign instead of competition.ig, and one of the practice tunnels?

ign launch -v 4 tunnel_circuit_practice.ign worldName:=tunnel_circuit_practice_01 robotName1:=X4 robotConfig1:=X4_SENSOR_CONFIG_2

osrf-migration commented 5 years ago

Original comment by Derek Knowles (Bitbucket: dknowles-ssci).


Yes, I will try on Monday.

Here’s another with the ign process still running.

https://gist.github.com/betaBison/f2abcfcd8bf0295ed4ca8fb5558dd0b2

osrf-migration commented 5 years ago

Original comment by Michael Carroll (Bitbucket: Michael Carroll).


We found a second deadlock that can affect the bringup portion of the process. In this case, ign-launch will hang when launching processes, and can only be killed with SIGTERM or higher. This ends up leaving a few of the residual processes around that we’ve been seeing. This PR (https://bitbucket.org/ignitionrobotics/ign-launch/pull-requests/36/eliminate-potential-deadlock-from-sigchld/diff) should address the deadlock on the way up.

osrf-migration commented 5 years ago

Original comment by Sarah Kitchen (Bitbucket: snkitche).


Can you change this issue back to Open at least until that PR is done?

osrf-migration commented 5 years ago

Original comment by Alfredo Bencomo (Bitbucket: bencomo).


osrf-migration commented 5 years ago

Original comment by Nate Koenig (Bitbucket: Nathan Koenig).


That PR has been merged and released.

osrf-migration commented 5 years ago

Original comment by Alfredo Bencomo (Bitbucket: bencomo).


You can ether install the new Docker image by following the instructions in the link below:

      `https://osrf-migration.github.io/subt-gh-pages/#!/osrf/subt/wiki/tutorials/SystemSetupDockerhub`

Or run the commands below to update your catkin environment.

sudo apt update && sudo apt upgrade -y
sudo reboot

cd ~/subt_ws/src/tunnel_circuithgpull && hg update tunnel_circuit

source /opt/ros/melodic/setup.bash

rm -rfv ~/.ignition/fuel/fuel.ignitionrobotics.org/openrobotics/models/*

cd ~/subt_ws/catkin_make install
. ~/subt_ws/install/setup.bash
ign launch -v 4 competition.ign robotName1:=X1 robotConfig1:=X1_SENSOR_CONFIG_2

Open another terminal and run these commands:

. /opt/ros/melodic/setup.bash
. ~/subt_ws/install/setup.bash

roslaunch subt_example teleop.launch
osrf-migration commented 5 years ago

Original comment by Zbyněk Winkler (Bitbucket: Zbyněk Winkler (robotika)).


Today I was not able to quit with ctrl+c as well. I had to take down the docker container (sha256:1145790d83973b37bd4851e6e329d24de062a99e19f898e22438c9d3882c00a6). I am glad I didn’t run it locally as it would have been pain to kill all that stuff by hand.

osrf-migration commented 5 years ago

Original comment by Alfredo Bencomo (Bitbucket: bencomo).


Are you using the latest docker image from a couple of days ago?

Which Docker version are you running it?

Any errors reported?

Did it happen only once with that image?

Were you running your controller in a different container when it happened?

Which launch configuration did you use and how many robots?

osrf-migration commented 5 years ago

Original comment by Martin Dlouhy (Bitbucket: robotikacz).


Well, as Zbynek mentioned in some other post “latest” is misleading (as it can change any time) and more precise as (sha256:1145790d83973b37bd4851e6e329d24de062a99e19f898e22438c9d3882c00a6) until you start versioning “releases” he cannot be

osrf-migration commented 5 years ago

Original comment by Alfredo Bencomo (Bitbucket: bencomo).


Yes, I get the point about the version tag and new images will have that. However, notice that when issues are reported we need more that just the version tag or the sha256 number.

osrf-migration commented 5 years ago

Original comment by Zbyněk Winkler (Bitbucket: Zbyněk Winkler (robotika)).


I just run the “./run.bash nkoenig/subt-virtual-testbed tunnel_circuit_practice.ign robotName1:=X1 robotConfig1:=X1_SENSOR_CONFIG_1” example and nothing else (no controller, no other container, just that one thing). I tried to cancel it while it was still starting (I think, its hard to tell when it is done).

osrf-migration commented 5 years ago

Original comment by Alfredo Bencomo (Bitbucket: bencomo).


Hi Zbyněk,

Thank you for that info. Can you elaborate more regarding its hard to tell when it is done?

osrf-migration commented 5 years ago

Original comment by Zbyněk Winkler (Bitbucket: Zbyněk Winkler (robotika)).


It starts a lot things and it takes a lot of time (almost a minute) and there is no message in the output (or somewhere else?) that would say something along the lines “I am done loading and starting all the stuff, feel free to start up your controller”. Since I was just testing if the X window appears, I killed it quite early.

I had a script for gazebo9 that would wait for a certain topic to appear to take a guess when it is done loading.

osrf-migration commented 5 years ago

Original comment by Alfredo Bencomo (Bitbucket: bencomo).