gzserver not dying after killing the simulation

musamarcusso commented 6 years ago

Hello everyone,

I have noticed that ever since updating to Gazebo 9.1, when I start Gazebo with roslaunch and then kill the simulation, gzserver does not die (and gzclient sometimes also lingers). I don't know if it has to do with the version of Gazebo, but I just noticed it has been happening since the update. I have been starting the simulation multiple times with an optimizer, so I noticed that that happens a lot. I can log how many times, but I would estimate 30% of the times the simulation starts. Has someone noticed that already? Any ideas on how to solve this? Thanks in advance.

romainreignier commented 6 years ago

I have also noticed that behavior. And if I wait a bit, gzserver seems to disappear. But of course it does work if I relaunch the simulation just after I've killed it. So I have to killall -9 gzserver

Not sure but may be related with https://github.com/ros-simulation/gazebo_ros_pkgs/commit/c6d6c76746c9383c5efa6226ad6dc05f8cea244a ?

musamarcusso commented 6 years ago

Yes, this didn't happen before with Gazebo 7.0 for me. I noticed this also affects my ROS tests if I have a number of them starting the simulation. I had to set different ports for the Gazebo instances in each one of the tests to be sure they always run without having the error that an instance of gzserver is already running.

kev-the-dev commented 6 years ago

+1 gzserver does seem to take an incredibly long time to shutdown sometimes. Having more plugins/models/a gzclient running makes this all take longer.

However, I think there may be an actual bug here, perhaps a deadlock involving the ROS plugins, as I can produce a case where gzserver seems to hang forever (waited for 10+ minutes). I also noticed that SIGTERM (sent by kill <pid>) seems to work in these cases.

Here are some rambling notes for anyone trying to debug this deadlock:

Gazebo has its own signal handlers for both SIGINT (sent by Ctrl+C) and SIGTERM
Gazebo ros api api plugin has its own SIGINT callback
Would be helpful to run in debugger rosrun gazebo_ros debug and see what the threads are doing while in this deadlock

Thanks for filing the issue for this, I'm sure many people have had this problem too.

musamarcusso commented 6 years ago

Hi @ironmig, any new updates from this issue?

kev-the-dev commented 6 years ago

I spent a little time on this a few weeks back but haven't found anything.

musamarcusso commented 6 years ago

I also haven't figure out exactly what happens there. Was there an issue with the old script?

kev-the-dev commented 6 years ago

I don't think this is related to the script. To check, try manually sending SIGINT to gzserver

ps aux | grep gzserver
kill -2 <pid associated with gzserver>

For me this still doesn't work.

Looking at GDB, mine seems to get stuck at Publisher::fini() within gazebo. It seems to be destroying hundreds of publishers and waiting the full 1 second timeout for each one. Related to this gazebo issue. Of course, it's hard to tell if we're all having the same problem

tahsinkose commented 6 years ago

I have been dealing with this issue for roughly ten months, as well. Since then, I applied a manual kill command after the end of each simulation to clear residual Gazebo processes. Therefore, I have written a simple Bash script that checks for residual Gazebo processes at each simulation startup. With that, I'm now able to automatically clear any gzserver and gzclient processes before the execution of new simulation. If you wonder, here is a link to the gist.

This is not a direct fix to the bug mentioned above, but only a workaround. In my self projects, this did really have a boost effect in terms of faster feature development, debugging and etc. You are free to use until the core issue will be resolved!

EDIT: Link is corrected.

josephcoombe commented 6 years ago

@tahsinkose I tried to follow link in your comment, but it was broken.

tahsinkose commented 6 years ago

@josephcoombe Uh, sorry for the broken link. Just a typo. Here is the correct link.

mjcarroll commented 6 years ago

I just got this PR merged: https://bitbucket.org/osrf/gazebo/pull-requests/3014/wip-address-gzserver-shutdown-speed/diff

It should address some of the issues with long shutdown times with Gazebo.

ahmetsaglam commented 5 years ago

As a newbie to both Ubuntu and Gazebo, I realized that after killing the simulation, using top, I can see gzserver is still running. Even I tried killall gzserver, it did not shut down. Then, I noticed apport (debugging program for Ubuntu) was consuming a lot of CPU power to collect the crash report for Gazebo-shutdown process, and it did not allow me to kill gzserver. After the crash report was ready (apport's job was done), gzserver was killed. I know this is not a fix why Gazebo crashes after the shutdown but at least it may save some time for new users to figure out what is going on when "killall gzserver" does not seem to "work".

cosmicog commented 4 years ago

Yo, ros devs;

Since I have no patience waiting our precious simulator Gazebo to shutdown and, in order to open it back with all other ros nodes, I inspected it a bit to find a way to kill it properly. Since we most probably won't be running any other ros nodes while the sim is closed, this is my way to shut it down. I'm assuming 99% of the time, Gazebo is launched with roslaunch(opening roscore automatically).

If I only kill gzserver and gzclient, I still can get these two;

 /gazebo
 /gazebo_gui

when I run rosnode list. While these are somehow awake, I see a weird behaviour, and cannot run any other roscore. Also rosnode kill -a have no effect on these nodes. rosnode info /gazebo outputs topic connections but says: "Communication with node[...] failed!" at the end of the output.

Anyway, without wasting more words, I now use [Ctrl] + [C] + this alias to assassinate it properly without sending any extra signals or using sudo:

alias killg='killall gzclient && killall gzserver && killall rosmaster'

Sihoj commented 3 years ago

Having the same problem and also having no patience, I made a small Python launcher, that intercepts [Ctrl] + [C] and issues the kill commands after a small timeout.

Save the Python code below as e.g. gzlauncher and make it executable chmod +x gzlauncher. I also added it to my PATH, so I can run commands like this from anywhere: gzlauncher roslaunch my_package my_launch or gzlauncher rosrun my_package my_node and use Ctrl+C as usual to fully kill Gazebo so it's immediately ready to relaunch.

Here's the Python code (feel free to use and adapt as you like):

#!/usr/bin/env python

import sys, signal, subprocess, time

timeout_before_kill = 1.0  # [s]
timeout_after_kill = 1.0  # [s]

def signal_handler(sig, frame):
    time.sleep(timeout_before_kill)
    subprocess.call("killall -q gzclient & killall -q gzserver", shell=True)
    time.sleep(timeout_after_kill)
    subprocess.call("killall -9 -q gzclient & killall -9 -q gzserver", shell=True)
    sys.exit(0)

if __name__ == "__main__":
    signal.signal(signal.SIGINT, signal_handler)
    cmd = ' '.join(sys.argv[1:])
    subprocess.call(cmd, shell=True)

adityapande-1995 commented 2 years ago

https://github.com/ros-simulation/gazebo_ros_pkgs/pull/1376 should fix it

prathameshsphulpagar commented 3 months ago

1) killall gzserver 2) sudo pkill gzserver 3) if thease both are not working then i) open new terminal and type "htop" then find "gzserver" and kill manually

ros-simulation / gazebo_ros_pkgs

gzserver not dying after killing the simulation #751