jbehley / SuMa

Surfel-based Mapping for 3d Laser Range Data (SuMa)
MIT License
543 stars 166 forks source link

Segmentation Fault (OpenGL renderer string: AMD RENOIR) #35

Closed Jimmij50 closed 1 year ago

Jimmij50 commented 2 years ago

Hi jbehley! Thanks for the open source code. I encontered a segmentation error when trying to run the visualizer. I have checked all the issues posted around the Eigen and gtsam. However, I still can not the segmentation error when using Eigen 3.2.7/3.3.7 and gtsam 4.0.0 alpha2.

I use the gdb ./visualizer run and bt, the result shows as below. (gdb) r Starting program: /home/ziyuli/catkin_ws/src/SuMa/bin/visualizer [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". [New Thread 0x7ffff2a4e700 (LWP 78180)] [New Thread 0x7ffff15f5700 (LWP 78181)] [New Thread 0x7ffff0c9f700 (LWP 78182)] [New Thread 0x7fffe4f35700 (LWP 78183)] [New Thread 0x7fffd7fff700 (LWP 78184)] [New Thread 0x7fffcf7fe700 (LWP 78185)] [New Thread 0x7fffd77fe700 (LWP 78186)] [New Thread 0x7fffd6ffd700 (LWP 78187)] [New Thread 0x7fffd67fc700 (LWP 78188)] [New Thread 0x7fffd5ffb700 (LWP 78189)] [New Thread 0x7fffd57fa700 (LWP 78190)] [New Thread 0x7fffd4ff9700 (LWP 78191)] [New Thread 0x7fffcffff700 (LWP 78192)] [New Thread 0x7fffceffd700 (LWP 78193)] [New Thread 0x7fffce7fc700 (LWP 78194)] [New Thread 0x7fffcdffb700 (LWP 78195)] [New Thread 0x7fffcd7fa700 (LWP 78196)] [New Thread 0x7fffccff9700 (LWP 78197)] [New Thread 0x7fff97fff700 (LWP 78198)] [New Thread 0x7fff977fe700 (LWP 78199)] [New Thread 0x7fff96ffd700 (LWP 78200)] [New Thread 0x7fff967fc700 (LWP 78201)] [New Thread 0x7fff95ffb700 (LWP 78202)] [New Thread 0x7fff957fa700 (LWP 78203)] [New Thread 0x7fff94ff9700 (LWP 78204)] [Thread 0x7fff957fa700 (LWP 78203) exited] OpenGL Context Version 4.6 core profile GLEW initialized. OpenGL context version: 4.6 OpenGL vendor string : AMD OpenGL renderer string: AMD RENOIR (DRM 3.40.0, 5.11.0-40-generic, LLVM 12.0.0) [New Thread 0x7fff957fa700 (LWP 78205)]

Thread 9 "visualizer:sh1" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fffd6ffd700 (LWP 78187)] 0x00007fffe7516582 in ?? () from /lib/x86_64-linux-gnu/libLLVM-12.so.1

and

(gdb) bt

0 0x00007fffe7516582 in ?? () from /lib/x86_64-linux-gnu/libLLVM-12.so.1

1 0x00007fffe75168e7 in ?? () from /lib/x86_64-linux-gnu/libLLVM-12.so.1

2 0x00007fffe750a79e in ?? () from /lib/x86_64-linux-gnu/libLLVM-12.so.1

3 0x00007fffe75094e9 in ?? () from /lib/x86_64-linux-gnu/libLLVM-12.so.1

4 0x00007fffe738081b in ?? () from /lib/x86_64-linux-gnu/libLLVM-12.so.1

5 0x00007fffe6254811 in llvm::AsmPrinter::emitFunctionBody() () from /lib/x86_64-linux-gnu/libLLVM-12.so.1

6 0x00007fffe72eb58b in ?? () from /lib/x86_64-linux-gnu/libLLVM-12.so.1

7 0x00007fffe5e18e2e in llvm::MachineFunctionPass::runOnFunction(llvm::Function&) () from /lib/x86_64-linux-gnu/libLLVM-12.so.1

8 0x00007fffe5c3636d in llvm::FPPassManager::runOnFunction(llvm::Function&) () from /lib/x86_64-linux-gnu/libLLVM-12.so.1

9 0x00007fffe6accde6 in ?? () from /lib/x86_64-linux-gnu/libLLVM-12.so.1

10 0x00007fffe5c369bf in llvm::legacy::PassManagerImpl::run(llvm::Module&) () from /lib/x86_64-linux-gnu/libLLVM-12.so.1

11 0x00007fffeb187c4a in ?? () from /usr/lib/x86_64-linux-gnu/dri/radeonsi_dri.so

12 0x00007fffeb0b0ba5 in ?? () from /usr/lib/x86_64-linux-gnu/dri/radeonsi_dri.so

13 0x00007fffeb0b2f81 in ?? () from /usr/lib/x86_64-linux-gnu/dri/radeonsi_dri.so

14 0x00007fffeb0af764 in ?? () from /usr/lib/x86_64-linux-gnu/dri/radeonsi_dri.so

15 0x00007fffeb1155a5 in ?? () from /usr/lib/x86_64-linux-gnu/dri/radeonsi_dri.so

16 0x00007fffea833f35 in ?? () from /usr/lib/x86_64-linux-gnu/dri/radeonsi_dri.so

17 0x00007fffea833a7b in ?? () from /usr/lib/x86_64-linux-gnu/dri/radeonsi_dri.so

18 0x00007ffff662f609 in start_thread (arg=) at pthread_create.c:477

19 0x00007ffff67df293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

How may I solve my problem? I am looking forward to your help ,thanks a lot !

jbehley commented 2 years ago

It seems like that you are using a Radeon card. In theory, I only use OpenGL and that should be not a problem. However, different vendors more strictly implement the OpenGL standard and therefore some are just implement the "must" part of the specification. It might be that I still missed to remove a part the is optional, but still works on Nvidia GPUs. (In the beginning, I could at least check on Intel's integrated-GPUs, but currently I have only Nvidia GPUs available).

Thus, it might be that I use something in OpenGL that is not supported by the driver of the AMD GPU that I'm not aware of. The backtrace is unfortunately not very helpful either. I don't know if one can install a "debug" version of the drivers that would enable to have at least an idea what code is is called and then figure out which command is causing the problem.

(If you have a different system with an Nvidia GPU and do not run in a virtual machine that could help...)

Jimmij50 commented 2 years ago

Thanks a lot !!! @jbehley. Your reply really inspired me to solve the problem.

I did not notice the OpenGL renderer string is AMD RENOIR. Actually, I ran the code on a environment. AMD Ryzen 9 5900HS with Nvidia RTX3070 laptop But the system runs the visualizer using Integrated Graphics in AMD CPU by default.

So, I first try to use DRI_PRIME=1 glxinfo | grep "OpenGL renderer" to change the renderer to Nvidia GPU but it shows: libGL error: failed to create dri screen libGL error: failed to load driver: nouveau

And I got same error when trying __GLX_VENDOR_LIBRARY_NAME=nvidia DRI_PRIME=1 glxinfo | grep "OpenGL renderer"

So, I boot into recovery mode and select Root Shell. And run: X -configure Then: cp /root/xorg.conf.new /etc/X11/xorg.conf to generate a xorg.conf file

Then I used __GLX_VENDOR_LIBRARY_NAME=nvidia DRI_PRIME=1 ./visualizer to use Nvidia GPU as the renderer

And the problem is solved.

Thanks again for your reply!!