osrf / rmf_demos

Demos to showcase the capabilities of RMF
Apache License 2.0
69 stars 38 forks source link

gzserver shuts down sometimes during loop demo #28

Closed aaronchongth closed 3 years ago

aaronchongth commented 4 years ago

This happens rarely, gzserver seems to be the first program to die, and without much information or reason. Leaving this here, and still looking into it.

# terminal 1
ros2 launch demos office.launch.xml

# terminal 2
ros2 launch demos office_loop.launch.xml

https://gist.github.com/aaronchongth/ff4de5e65c9e290baa920aaad0c8b67f

Yadunund commented 4 years ago

Running into this error as well at times.

[ERROR] [gzserver-11]: process has died [pid 28055, exit code -11, cmd 'gzserver --verbose -s libgazebo_ros_init.so /home/yadu/ws_rmf/install/rmf_dp2_maps/share/rmf_dp2_maps/maps/dp2/dp2.world'].
[gzclient-12] [Msg] Waiting for master.
[gzclient-12] [Msg] Connected to gazebo master @ http://127.0.0.1:11345
[gzclient-12] [Msg] Publicized address: 192.168.0.196
[gzclient-12] ToggleFloors::ToggleFloors()
[gzclient-12] ToggleFloors::Load()
[gzclient-12] ToggleFloors::Load found a floor element: [B1]->[building_B1]
[gzclient-12] [Wrn] [Publisher.cc:135] Queue limit reached for topic /gazebo/world/user_camera/pose, deleting message. This warning is printed only once.
cnboonhan commented 4 years ago

Hi, yes I do encounter this as well!

intelnuc71 commented 4 years ago

Yeap. Same issue here. Right after the office world is loaded, gazebo shuts down.

-------------------------------------------------Shutdown Error [gzclient-9] [Msg] Publicized address: 192.168.100.203 [gzclient-9] [Wrn] [Event.cc:61] Warning: Deleting a connection right after creation. Make sure to save the ConnectionPtr from a Connect call [ERROR] [gzclient-9]: process[gzclient-9] failed to terminate '5' seconds after receiving 'SIGINT', escalating to 'SIGTERM' [ERROR] [gzserver-8]: process[gzserver-8] failed to terminate '5' seconds after receiving 'SIGINT', escalating to 'SIGTERM' [INFO] [gzclient-9]: sending signal 'SIGTERM' to process[gzclient-9] [INFO] [gzserver-8]: sending signal 'SIGTERM' to process[gzserver-8] [ERROR] [gzserver-8]: process has died [pid 77104, exit code -15, cmd 'gzserver --verbose -s libgazebo_ros_factory.so -s libgazebo_ros_init.so /home/vgv/ws_rmf_demos/install/rmf_demo_maps/share/rmf_demo_maps/maps/office/office.world']. [ERROR] [gzclient-9]: process has died [pid 77129, exit code -15, cmd 'gzclient --verbose /home/vgv/ws_rmf_demos/install/rmf_demo_maps/share/rmf_demo_maps/maps/office/office.world']. [gzclient-9] [gzclient-9]

gazebo --version Gazebo multi-robot simulator, version 11.1.0 Copyright (C) 2012 Open Source Robotics Foundation. Released under the Apache 2 License. http://gazebosim.org

Gazebo multi-robot simulator, version 11.1.0 Copyright (C) 2012 Open Source Robotics Foundation. Released under the Apache 2 License. http://gazebosim.org

Achllle commented 4 years ago

Running into this issue as well on ros2 launch demos airport_terminal.launch.xml as well as the clinic world. I don't think this has anything to do with that particular demo. Gazebo crashes every single time for me on this launch. Installed on a Foxy docker, on an Intel NUCi5.

Copying all suspicious output:

[gzserver-9] libGL error: MESA-LOADER: failed to retrieve device information
[gzserver-9] libGL error: Version 4 or later of flush extension not found
[gzserver-9] libGL error: failed to load driver: i915
[gzserver-9] libGL error: failed to open /dev/dri/card0: No such file or directory
[gzserver-9] libGL error: failed to load driver: i965
[gzclient-10] libGL error: MESA-LOADER: failed to retrieve device information
[gzclient-10] libGL error: Version 4 or later of flush extension not found
[gzclient-10] libGL error: failed to load driver: i915
[gzclient-10] libGL error: failed to open /dev/dri/card0: No such file or directory
[gzclient-10] libGL error: failed to load driver: i965

[ERROR] [gzclient-10]: process has died [pid 28188, exit code -11, cmd 'gzclient --verbose /~/rmf_demos_ws/install/rmf_demo_maps/share/rmf_demo_maps/maps/airport_terminal/airport_terminal.world'].
Yadunund commented 4 years ago

Hi @Achllle ,

I believe the error in your case is due to unavailable drivers in your docker container. In general for running gazebo/ignition simulations it is recommended to use a desktop installation of ubuntu 20.04 with foxy binaries installed. This way the appropriate hardware drivers can be configured.

Achllle commented 4 years ago

I should mention that office world works fine for me so that seems like an unlikely explanation, but I might be wrong.

codebot commented 4 years ago

Ah yes, that is very good to know.

After a crash during the airport world startup, can you skim through the end of the OGRE log in ~/.gazebo/ogre.log ? That is usually where any graphics-related errors will have the most detailed information.

Achllle commented 4 years ago

Skimmed through the last 100 lines, nothing stood out in particular. Here's the last couple of lines:

21:54:09: Initialising resource group General
21:54:09: Texture: /root/.gazebo/models/WhiteChipChair/meshes//WhiteChipChairDiffuse.png: Loading 1 faces(PF_A8R8G8B8,1024x1024x1) with 5 hardware generated mipmaps from Image. Internal format is PF_A8R8G8B8,1024x1024x1.
21:54:10: Added resource location '/root/.gazebo/models/ElectronicsRecycling/meshes/' of type 'FileSystem' to resource group 'General' with recursive option
21:54:10: Initialising resource group General
21:54:10: Texture: /root/.gazebo/models/ElectronicsRecycling/meshes//ElectronicsRecycling_Diffuse.png: Loading 1 faces(PF_A8R8G8B8,1024x1024x1) with 5 hardware generated mipmaps from Image. Internal format is PF_A8R8G8B8,1024x1024x1.
21:54:10: [SkyX] VClouds warning: unregistered camera registered, manual unregistering is needed before camera destruction
21:54:10: Texture: spot_shadow_fade.png: Loading 1 faces(PF_R8G8B8,128x128x1) with 5 hardware generated mipmaps from Image. Internal format is PF_X8R8G8B8,128x128x1.

There are no logs in the 2 seconds before or after the crash. Looking at system resources, they are quite high but not maxed out.

codebot commented 4 years ago

Can you try running it natively on your machine? I'm wondering if it's somehow related to Docker running out of RAM or some other resource. The airport world is much larger than the office world in every respect, so it's definitely hammering the system much harder.

Yadunund commented 3 years ago

Hi @Achllle ,

Was able to re-create your issue with Gazebo client crashing for the airport_terminal world. Turns out this happens due to missing files in one of the models. I've logged the issue along with the fix in this ticket https://github.com/osrf/rmf_demos/issues/179