lopsided98 / nix-ros-overlay

ROS overlay for the Nix package manager
Apache License 2.0
191 stars 76 forks source link

Gazebo crash on startup #161

Closed erdnaxe closed 5 months ago

erdnaxe commented 2 years ago

I installed nix-ros-overlay.rosPackages.noetic.gazebo. When I run gazebo --verbose, I get:

Gazebo multi-robot simulator, version 11.9.0
Copyright (C) 2012 Open Source Robotics Foundation.
Released under the Apache 2 License.
http://gazebosim.org

[Msg] Waiting for master.
[Msg] Connected to gazebo master @ http://127.0.0.1:11345
[Msg] Publicized address: 192.168.56.1
Gazebo multi-robot simulator, version 11.9.0
Copyright (C) 2012 Open Source Robotics Foundation.
Released under the Apache 2 License.
http://gazebosim.org

[Msg] Waiting for master.
[Msg] Connected to gazebo master @ http://127.0.0.1:11345
[Msg] Publicized address: 192.168.56.1
[Msg] Loading world file [/nix/store/qgrfkh8bqddgzkawb8irnwrxxrkdq8q5-gazebo-11.9.0/share/gazebo-11/worlds/empty.world]
[ALSOFT] (EE) Failed to set real-time priority for thread: Operation not permitted (1)
[ALSOFT] (EE) Failed to set real-time priority for thread: Operation not permitted (1)
[Err] [Scene.cc:227] Service call[/shadow_caster_material_name] timed out
[Err] [Scene.cc:249] Service call[/shadow_caster_render_back_faces] timed out
[Wrn] [Scene.cc:462] Ignition transport [/scene_info] service call failed, falling back to gazebo transport [scene_info] request.
[Wrn] [GuiIface.cc:120] Qt has caught an exception thrown from an event handler. Throwing
exceptions from an event handler is not supported in Qt.
You must not let any exception whatsoever propagate through Qt code.
If that is not possible, in Qt 5 you must at least reimplement
QCoreApplication::notify() and catch all exceptions there.

[Err] [main.cc:37] Ogre Error:RuntimeAssertionException: Ogre/ShadowExtrudePointLight not found. Verify that you referenced the 'ShadowVolume' folder in your resources.cfg in initialise at /build/source/OgreMain/src/OgreShadowVolumeExtrudeProgram.cpp (line 70)

gazebo_crash

I am seeing a window with a black 3D viewport for some seconds before Gazebo crash. I am using Intel integrated graphics on NixOS 21.11. Should Gazebo work out of the box? Did I miss something somewhere?

lopsided98 commented 2 years ago

Yes, Gazebo is broken right now. A few weeks ago I fixed some of the issues, but it still ended crashing a little later in the startup process. I'm not sure I got the same error as you, but I do remember that the window would appear for a short time before it crashed. Unfortunately, I'm not sure when I will have the time to really dig into it.

If you are able to debug it, that would be really helpful. I'm not certain when it stopped working, but it may have been when I upgraded to 11.9.0, so you may want to try reverting that update to help narrow down the cause.

erdnaxe commented 2 years ago

I get the same issue when I change version in pkgs/gazebo/default.nix to:

My test command is nix-build . -A gazebo && ./result/bin/gazebo --verbose.

I am now trying to narrow down the issue by playing with Ogre version.

edit: my setup was wrong, for some reasons I was still running Gazebo 11.9.0.

erdnaxe commented 2 years ago

This issue looks like https://github.com/osrf/gazebo/issues/2700

erdnaxe commented 2 years ago

~Gazebo built with ogre replaced by ogre1_9 or ogre1_10 also crashes.~

~gazebo_9 also crashes on master.~

~Same error if I checkout https://github.com/lopsided98/nix-ros-overlay/commit/effc6ce55df4c972608cb29437dcd3a6be3705bc and build gazebo_9.~

~Maybe it is caused by an external library that got updated in nixpkgs?~

erdnaxe commented 2 years ago

Ah something is wrong with my setup:

$ ls -l
[...]
lrwxrwxrwx  1 erdnaxe users   57 Jan 10 10:09 result -> /nix/store/nwl4zdpsspw5kabcwbj5a7m7xvhz79sd-gazebo-9.18.0
$ ./result/bin/gazebo --verbose
Gazebo multi-robot simulator, version 11.9.0
[...]

For some reasons running gazebo_9 bin/gazebo is running 11.9.0... So all my past tests are wrong.

traversaro commented 2 years ago

Gazebo built with ogre replaced by ogre1_9 or ogre1_10 also crashes.

Do you get the same error:

[Err] [main.cc:37] Ogre Error:RuntimeAssertionException: Ogre/ShadowExtrudePointLight not found. Verify that you referenced the 'ShadowVolume' folder in your resources.cfg in initialise at /build/source/OgreMain/src/OgreShadowVolumeExtrudeProgram.cpp (line 70)

even for Ogre 1.9 and Ogre 1.10? That would be quite strange.

beezow commented 2 years ago

I was able to successfully run gazebo by using gazebo_9 rather than gazebo. While this is not a fix for the bug, I wanted to point it out as a temporary bypass for people that need to get gazebo running.

beezow commented 2 years ago

I also tried reverting to ogre1_10 within gazebo 11.9. I was able to successfully launch and run it, albeit, with the following error.

[Err] [Scene.cc:227] Service call[/shadow_caster_material_name] timed out
[Err] [Scene.cc:249] Service call[/shadow_caster_render_back_faces] timed out

However, everything did work as expected. Based on other issues, pinning the ogre version seems to be the accepted fix. Maybe @traversaro can comment more on this as they seemed to be involved on the upstream issues, and I am not familiar with either project. My branch is here, beezow/nix-ros-overlay. One caveat to this is that @lopsided98 explicitly updated the ogre version in https://github.com/lopsided98/nix-ros-overlay/commit/6178a8c42f31f1a70d9fec3e68c4bfb8e1e299c0. Perhaps I am missing something?

beezow commented 2 years ago

So while gazebo was working, when trying to use gazebo_ros there were still some problems.

➤ rosrun gazebo_ros gzserver -e ode worlds/emtpy.world --verbose $final
Gazebo multi-robot simulator, version 11.9.0
Copyright (C) 2012 Open Source Robotics Foundation.
Released under the Apache 2 License.
http://gazebosim.org

munmap_chunk(): invalid pointer
/nix/store/8fj7m50ag8si1gzgdya7j2v2b9r5z1dw-ros-noetic-gazebo-ros-2.9.2-r1/lib/gazebo_ros/gzserver: line 41: 179679 Aborted                 (core dumped) GAZEBO_MASTER_URI="$desired_master_uri" GAZEBO_MODEL_DATABASE_URI="$desired_model_database_uri" gzserver $final

I was able to track this down the libgazebo_ros_paths_plugin.so as triggering the fault. When removed from the shared library list for gzserver, the error goes away. I tested this by directly calling gzserver with the args from gazebo_ros. This can be found in /nix/store/8fj7m50ag8si1gzgdya7j2v2b9r5z1dw-ros-noetic-gazebo-ros-2.9.2-r1/lib/gazebo_ros.

gzserver -e ode worlds/empty.world --verbose -s /nix/store/8fj7m50ag8si1gzgdya7j2v2b9r5z1dw-ros-noetic-gazebo-ros-2.9.2-r1/lib/libgazebo_ros_paths_plugin.so -s /nix/store/8fj7m50ag8si1gzgdya7j2v2b9r5z1dw-ros-noetic-gazebo-ros-2.9.2-r1/lib/libgazebo_ros_api_plugin.so

I dont know if this is related or not but wanted to document my efforts to get gazebo working again

beezow commented 2 years ago

After doing some debugging, ros::package::getPlugins("gazebo_ros","gazebo_media_path",gazebo_media_paths); in libgazebo_ros_paths_plugin.cpp is causing problems. In fact, every call to ros::package::getPlugins will trigger the fault. My GDB stacktrace:

#0  0x00007ffff3b98baa in raise () from /nix/store/wl60dr9p15rwf53gxz61ijgisc1zdjc7-glibc-2.33-59/lib/libc.so.6
#1  0x00007ffff3b83523 in abort () from /nix/store/wl60dr9p15rwf53gxz61ijgisc1zdjc7-glibc-2.33-59/lib/libc.so.6
#2  0x00007ffff3bd92e8 in __libc_message () from /nix/store/wl60dr9p15rwf53gxz61ijgisc1zdjc7-glibc-2.33-59/lib/libc.so.6
#3  0x00007ffff3be0b0a in malloc_printerr () from /nix/store/wl60dr9p15rwf53gxz61ijgisc1zdjc7-glibc-2.33-59/lib/libc.so.6
#4  0x00007ffff3be0f3c in munmap_chunk () from /nix/store/wl60dr9p15rwf53gxz61ijgisc1zdjc7-glibc-2.33-59/lib/libc.so.6
#5  0x00007ffff3be5ac3 in free () from /nix/store/wl60dr9p15rwf53gxz61ijgisc1zdjc7-glibc-2.33-59/lib/libc.so.6
#6  0x00007ffff3c221bd in closedir () from /nix/store/wl60dr9p15rwf53gxz61ijgisc1zdjc7-glibc-2.33-59/lib/libc.so.6
#7  0x00007ffff4cec375 in boost::filesystem::detail::dir_itr_close(void*&, void*&) () from /nix/store/m53nb05wzx9d15k4xp8i29mdf104a896-boost-1.73.0/lib/libboost_filesystem.so.1.73.0
#8  0x00007fffe7f1519b in boost::filesystem::detail::dir_itr_imp::~dir_itr_imp (this=0xd69e00, __in_chrg=<optimized out>) at include/boost/filesystem/directory.hpp:292
#9  boost::sp_adl_block::intrusive_ptr_release<boost::filesystem::detail::dir_itr_imp, boost::sp_adl_block::thread_safe_counter> (p=0xd69e00) at include/boost/smart_ptr/intrusive_ref_counter.hpp:173
#10 boost::intrusive_ptr<boost::filesystem::detail::dir_itr_imp>::~intrusive_ptr (this=0x7ffffffcf6b0, __in_chrg=<optimized out>) at include/boost/smart_ptr/intrusive_ptr.hpp:98
#11 boost::filesystem::directory_iterator::~directory_iterator (this=0x7ffffffcf6b0, __in_chrg=<optimized out>) at include/boost/filesystem/directory.hpp:307
#12 rospack::Rosstackage::isStackage (this=this@entry=0x7fffe7f45200 <rospack::ROSPack::run(int, char**)::rp>, path=...) at /build/rospack-release-release-noetic-rospack-2.6.2-1/src/rospack.cpp:341
#13 0x00007fffe7f19feb in rospack::Rosstackage::crawlDetail (this=0x7fffe7f45200 <rospack::ROSPack::run(int, char**)::rp>, path=..., force=false, depth=2, collect_profile_data=false, profile_data=..., 
    profile_hash=...) at /build/rospack-release-release-noetic-rospack-2.6.2-1/src/rospack.cpp:1484
#14 0x00007fffe7f1a3fc in rospack::Rosstackage::crawlDetail (this=this@entry=0x7fffe7f45200 <rospack::ROSPack::run(int, char**)::rp>, path=..., force=<optimized out>, depth=depth@entry=1, 
    collect_profile_data=collect_profile_data@entry=false, profile_data=..., profile_hash=...) at /build/rospack-release-release-noetic-rospack-2.6.2-1/src/rospack.cpp:1546
#15 0x00007fffe7f1b721 in rospack::Rosstackage::crawl (this=this@entry=0x7fffe7f45200 <rospack::ROSPack::run(int, char**)::rp>, search_path=..., force=force@entry=false)
    at /build/rospack-release-release-noetic-rospack-2.6.2-1/src/rospack.cpp:392
#16 0x00007fffe7f2d04d in rospack::rospack_run (argc=argc@entry=4, argv=argv@entry=0xc74630, rp=..., output=...) at /build/rospack-release-release-noetic-rospack-2.6.2-1/src/rospack_cmdline.cpp:218
#17 0x00007fffe7f29319 in rospack::ROSPack::run (this=this@entry=0x7fffe7f5c0c0 <ros::package::command(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::rp>, 
    argc=argc@entry=4, argv=argv@entry=0xc74630) at /build/rospack-release-release-noetic-rospack-2.6.2-1/src/rospack_backcompat.cpp:46
#18 0x00007fffe7f297aa in rospack::ROSPack::run (this=this@entry=0x7fffe7f5c0c0 <ros::package::command(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::rp>, cmd=...)
    at /build/rospack-release-release-noetic-rospack-2.6.2-1/src/rospack_backcompat.cpp:80
#19 0x00007fffe7f4f8e6 in ros::package::command (_cmd=...) at /build/ros-release-release-noetic-roslib-1.15.8-1/src/package.cpp:55
#20 0x00007fffe7f4fbfa in ros::package::command (cmd=..., output=...) at /build/ros-release-release-noetic-roslib-1.15.8-1/src/package.cpp:73
#21 0x00007fffe7f501f1 in ros::package::getPlugins (package=..., attribute=..., packages=..., plugins=..., force_recrawl=<optimized out>) at /build/ros-release-release-noetic-roslib-1.15.8-1/src/package.cpp:118
#22 0x00007fffe7f50bb4 in ros::package::getPlugins (package=..., attribute=..., plugins=..., force_recrawl=<optimized out>) at /build/ros-release-release-noetic-roslib-1.15.8-1/src/package.cpp:155
#23 0x00007fffe5a6ede1 in gazebo::GazeboRosPathsPlugin::LoadPaths (this=<optimized out>) at /home/beezow/opt/nix-ros-overlay/testing/catkin_ws/src/gazebo_ros_pkgs/gazebo_ros/src/gazebo_ros_paths_plugin.cpp:88
#24 0x00007fffe5a6f8db in gazebo::GazeboRosPathsPlugin::GazeboRosPathsPlugin (this=0xd8d860)
    at /home/beezow/opt/nix-ros-overlay/testing/catkin_ws/src/gazebo_ros_pkgs/gazebo_ros/src/gazebo_ros_paths_plugin.cpp:54
#25 gazebo::RegisterPlugin () at /home/beezow/opt/nix-ros-overlay/testing/catkin_ws/src/gazebo_ros_pkgs/gazebo_ros/src/gazebo_ros_paths_plugin.cpp:52
#26 0x00007ffff7fa869e in ?? () from /nix/store/rcj6ykfdv59ixdqdwb2cr6ia6k3rs6kg-gazebo-11.9.0/lib/libgazebo.so.11
#27 0x00007ffff7fa8b87 in ?? () from /nix/store/rcj6ykfdv59ixdqdwb2cr6ia6k3rs6kg-gazebo-11.9.0/lib/libgazebo.so.11
#28 0x00007ffff7f762a8 in gazebo::Server::ParseArgs(int, char**) () from /nix/store/rcj6ykfdv59ixdqdwb2cr6ia6k3rs6kg-gazebo-11.9.0/lib/libgazebo.so.11
#29 0x000000000040a118 in ?? ()
#30 0x00007ffff3b84780 in __libc_start_main () from /nix/store/wl60dr9p15rwf53gxz61ijgisc1zdjc7-glibc-2.33-59/lib/libc.so.6
#31 0x000000000040a1fa in ?? ()

If anyone has any insights on what may be causing the issue that would be great. I notice the call stack goes through boost so I wonder if somehow multiple versions are being pulled in? Or maybe something with the rospack dependencies are mucked up?

beezow commented 2 years ago

It seems gazebo was pulling in boost 173, while gazebo_ros was pulling in boost 177. By upgrading gazebo to boost177, the segfaults went away. Can finally run gazebo_ros again! You should be able to use the staging branch as it incorporates all the updates needed.

Can someone else can confirm the staging branch works for gazebo_ros?

The only remaining problem is with the shadows. There are some errors related to the shadow caster. The shadows from light sources do not work, with the light shining right through any obstacles.

lopsided98 commented 2 years ago

I finally got a chance to test this and can confirm that staging works. Staging has now been merged into master.

Do you know if the shadows not working is a new issue?

beezow commented 2 years ago

I finally got a chance to test this and can confirm that staging works. Staging has now been merged into master.

Do you know if the shadows not working is a new issue?

Not sure if it is new. It seems to be present with both gazebo_9 and gazebo_11 though. Were you able to replicate it?

lopsided98 commented 2 years ago

I don't see any shadows, but I don't see any shadow related errors either. I did notice that attempting to add more than one light results in the Gazebo freezing. It looks like the same problem as https://github.com/osrf/gazebo/issues/2373, and I was able to fix it by using OGRE 1.9.

beezow commented 2 years ago

I tried reverting to OGRE 1.9,ant the shadows seem to be working, but I still get the error messages. Additionally can confirm that using OGRE 1.9 fixes the crash with multiple light sources.