Open musamarcusso opened 6 years ago
I have also noticed that behavior. And if I wait a bit, gzserver seems to disappear. But of course it does work if I relaunch the simulation just after I've killed it. So I have to killall -9 gzserver
Not sure but may be related with https://github.com/ros-simulation/gazebo_ros_pkgs/commit/c6d6c76746c9383c5efa6226ad6dc05f8cea244a ?
Yes, this didn't happen before with Gazebo 7.0 for me. I noticed this also affects my ROS tests if I have a number of them starting the simulation. I had to set different ports for the Gazebo instances in each one of the tests to be sure they always run without having the error that an instance of gzserver
is already running.
+1 gzserver does seem to take an incredibly long time to shutdown sometimes. Having more plugins/models/a gzclient running makes this all take longer.
However, I think there may be an actual bug here, perhaps a deadlock involving the ROS plugins, as I can produce a case where gzserver seems to hang forever (waited for 10+ minutes). I also noticed that SIGTERM (sent by kill <pid>
) seems to work in these cases.
Here are some rambling notes for anyone trying to debug this deadlock:
rosrun gazebo_ros debug
and see what the threads are doing while in this deadlockThanks for filing the issue for this, I'm sure many people have had this problem too.
Hi @ironmig, any new updates from this issue?
I spent a little time on this a few weeks back but haven't found anything.
I also haven't figure out exactly what happens there. Was there an issue with the old script?
I don't think this is related to the script. To check, try manually sending SIGINT to gzserver
ps aux | grep gzserver
kill -2 <pid associated with gzserver>
For me this still doesn't work.
Looking at GDB, mine seems to get stuck at Publisher::fini() within gazebo. It seems to be destroying hundreds of publishers and waiting the full 1 second timeout for each one. Related to this gazebo issue. Of course, it's hard to tell if we're all having the same problem
I have been dealing with this issue for roughly ten months, as well. Since then, I applied a manual kill command after the end of each simulation to clear residual Gazebo processes. Therefore, I have written a simple Bash script that checks for residual Gazebo processes at each simulation startup. With that, I'm now able to automatically clear any gzserver
and gzclient
processes before the execution of new simulation. If you wonder, here is a link to the gist.
This is not a direct fix to the bug mentioned above, but only a workaround. In my self projects, this did really have a boost effect in terms of faster feature development, debugging and etc. You are free to use until the core issue will be resolved!
EDIT: Link is corrected.
@tahsinkose I tried to follow link in your comment, but it was broken.
@josephcoombe Uh, sorry for the broken link. Just a typo. Here is the correct link.
I just got this PR merged: https://bitbucket.org/osrf/gazebo/pull-requests/3014/wip-address-gzserver-shutdown-speed/diff
It should address some of the issues with long shutdown times with Gazebo.
As a newbie to both Ubuntu and Gazebo, I realized that after killing the simulation, using top, I can see gzserver is still running. Even I tried killall gzserver, it did not shut down. Then, I noticed apport (debugging program for Ubuntu) was consuming a lot of CPU power to collect the crash report for Gazebo-shutdown process, and it did not allow me to kill gzserver. After the crash report was ready (apport's job was done), gzserver was killed. I know this is not a fix why Gazebo crashes after the shutdown but at least it may save some time for new users to figure out what is going on when "killall gzserver" does not seem to "work".
Yo, ros devs;
Since I have no patience waiting our precious simulator Gazebo to shutdown and, in order to open it back with all other ros nodes, I inspected it a bit to find a way to kill it properly. Since we most probably won't be running any other ros nodes while the sim is closed, this is my way to shut it down. I'm assuming 99% of the time, Gazebo is launched with roslaunch(opening roscore automatically).
If I only kill gzserver
and gzclient
, I still can get these two;
/gazebo
/gazebo_gui
when I run rosnode list
. While these are somehow awake, I see a weird behaviour, and cannot run any other roscore
. Also rosnode kill -a
have no effect on these nodes. rosnode info /gazebo
outputs topic connections but says: "Communication with node[...] failed!" at the end of the output.
Anyway, without wasting more words, I now use [Ctrl] + [C]
+ this alias to assassinate it properly without sending any extra signals or using sudo:
alias killg='killall gzclient && killall gzserver && killall rosmaster'
Having the same problem and also having no patience, I made a small Python launcher, that intercepts [Ctrl] + [C]
and issues the kill commands after a small timeout.
Save the Python code below as e.g. gzlauncher
and make it executable chmod +x gzlauncher
. I also added it to my PATH, so I can run commands like this from anywhere:
gzlauncher roslaunch my_package my_launch
or
gzlauncher rosrun my_package my_node
and use Ctrl+C
as usual to fully kill Gazebo so it's immediately ready to relaunch.
Here's the Python code (feel free to use and adapt as you like):
#!/usr/bin/env python
import sys, signal, subprocess, time
timeout_before_kill = 1.0 # [s]
timeout_after_kill = 1.0 # [s]
def signal_handler(sig, frame):
time.sleep(timeout_before_kill)
subprocess.call("killall -q gzclient & killall -q gzserver", shell=True)
time.sleep(timeout_after_kill)
subprocess.call("killall -9 -q gzclient & killall -9 -q gzserver", shell=True)
sys.exit(0)
if __name__ == "__main__":
signal.signal(signal.SIGINT, signal_handler)
cmd = ' '.join(sys.argv[1:])
subprocess.call(cmd, shell=True)
1) killall gzserver 2) sudo pkill gzserver 3) if thease both are not working then i) open new terminal and type "htop" then find "gzserver" and kill manually
Hello everyone,
I have noticed that ever since updating to Gazebo 9.1, when I start Gazebo with
roslaunch
and then kill the simulation,gzserver
does not die (andgzclient
sometimes also lingers). I don't know if it has to do with the version of Gazebo, but I just noticed it has been happening since the update. I have been starting the simulation multiple times with an optimizer, so I noticed that that happens a lot. I can log how many times, but I would estimate 30% of the times the simulation starts. Has someone noticed that already? Any ideas on how to solve this? Thanks in advance.