isl-org / Open3D

Open3D: A Modern Library for 3D Data Processing
http://www.open3d.org
Other
11.4k stars 2.3k forks source link

Getting `Illegal instruction (core dumped)` when running Open3D app or `import open3d` in python #6127

Open hradec opened 1 year ago

hradec commented 1 year ago

Checklist

Describe the issue

For the pip install binary, I tried running in my standard arch distro, and then in a Ubuntu 20.04 docker container... same Illegal instruction error on all of then.

I then build it from source successfully in my arch distro, and got the same Illegal instruction on booth Open3D app and python import open3d

My CPU is and old Xeon X5675, so maybe I need change some compiling flags and rebuild from source to build a binary compatible with my cpu instructions?

I'm building with LLVM 11, standard build options. (just running cmake .. in a build sub-folder)

this is my lscpu output:

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   40 bits physical, 48 bits virtual
CPU(s):                          12
On-line CPU(s) list:             0-11
Thread(s) per core:              2
Core(s) per socket:              6
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           44
Model name:                      Intel(R) Xeon(R) CPU           X5675  @ 3.07GHz
Stepping:                        2
Frequency boost:                 enabled
CPU MHz:                         3569.871
CPU max MHz:                     3068.0000
CPU min MHz:                     1600.0000
BogoMIPS:                        7359.95
Virtualization:                  VT-x
L1d cache:                       192 KiB
L1i cache:                       192 KiB
L2 cache:                        1.5 MiB
L3 cache:                        12 MiB
NUMA node0 CPU(s):               0-11
Vulnerability Itlb multihit:     KVM: Mitigation: VMX disabled
Vulnerability L1tf:              Mitigation; PTE Inversion; VMX vulnerable
Vulnerability Mds:               Vulnerable; SMT vulnerable
Vulnerability Meltdown:          Vulnerable
Vulnerability Spec store bypass: Vulnerable
Vulnerability Spectre v1:        Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers
Vulnerability Spectre v2:        Vulnerable, IBPB: disabled, STIBP: disabled
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 
                                 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 popcnt aes lahf_lm epb ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid dtherm ida arat flush_l1d

Steps to reproduce the bug

I'm just building from source and running it. Nothing fancy.

Error message

running Open3D app in gdb, this is the output I get:

$ gdb ./Open3D/Open3D 
GNU gdb (GDB) 7.11
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./Open3D/Open3D...r
done.
(gdb) r
Starting program: /RAID/atomo/home/rhradec/dev/open3d/build-debug/Open3D/Open3D 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".

Program received signal SIGILL, Illegal instruction.
0x000055555594f30e in _GLOBAL__sub_I_bvh_collider.cpp ()
(gdb) bt
#0  0x000055555594f30e in _GLOBAL__sub_I_bvh_collider.cpp ()
#1  0x000055555cde27cd in __libc_csu_init ()
#2  0x00007ffff73310de in __libc_start_main () from /usr/lib/libc.so.6
#3  0x0000555555950eae in _start ()
(gdb) q

I've looked for *collider.cpp files, and found embree/src/ext_embree/kernels/bvh/bvh_collider.cpp, so this could be an issue with embree building?

Expected behavior

Just being able to run Open3D or import open3d in python.

Open3D, Python and System information

- Operating system: Ubuntu 20.04 (Docker running on an Arch Linux Distro)
- Python version: 3.8.10 (default, Mar 13 2023, 10:26:41) - [GCC 9.4.0]
- Open3D version: Only get `Illegal instruction (core dumped)` 
- System architecture: x86_64 (Xeon X5675)
- Is this a remote workstation?: no
- How did you install Open3D?: pip / build from source
- Compiler version (if built from source): gcc 10.2.0 / clang+LLVM 11.1.0

Additional information

I've tried building with -DCMAKE_BUILD_TYPE=Debug, but had to remove -Werror strings from cmake/Open3DShowAndAbortOnWarning.cmake to be able to finish building.

Got the same error with debug version, and didn't get any proper debug output in gdb.

hradec commented 1 year ago

It seems this problem could be related to AVX instructions set that is not availabel in Xeon X5675. I've found this forum where embree was crashing if built with AVX support on servers that lack the instruction set in their cpu: https://discourse.paraview.org/t/embree-problem-with-illegal-instruction-in-paraview-5-6-1/2070/2

Also, I found the same problem in this old bug: https://github.com/isl-org/Open3D/issues/5682

So I've modified 3rdparty/embree/embree.cmake to check if the CPU has AVX support using lscpu:

diff --git a/3rdparty/embree/embree.cmake b/3rdparty/embree/embree.cmake
index 4e93eadf1..48c1abeb8 100644
--- a/3rdparty/embree/embree.cmake
+++ b/3rdparty/embree/embree.cmake
@@ -36,17 +36,30 @@ elseif(LINUX_AARCH64)
     set(ISA_LIBS "")
     set(ISA_BUILD_BYPRODUCTS "")
 else() # Linux(x86) and WIN32
-    set(ISA_ARGS -DEMBREE_ISA_AVX=ON
-                 -DEMBREE_ISA_AVX2=ON
-                 -DEMBREE_ISA_AVX512=OFF
-                 -DEMBREE_ISA_SSE2=OFF
-                 -DEMBREE_ISA_SSE42=OFF
-    )
-    # order matters. link libs with increasing ISA order.
-    set(ISA_LIBS embree_avx embree_avx2)
-    set(ISA_BUILD_BYPRODUCTS "<INSTALL_DIR>/${Open3D_INSTALL_LIB_DIR}/${CMAKE_STATIC_LIBRARY_PREFIX}embree_avx${CMAKE_STATIC_LIBRARY_SUFFIX}"
-                             "<INSTALL_DIR>/${Open3D_INSTALL_LIB_DIR}/${CMAKE_STATIC_LIBRARY_PREFIX}embree_avx2${CMAKE_STATIC_LIBRARY_SUFFIX}"
-    )
+    execute_process(COMMAND lscpu | grep -i flags | grep -i avx OUTPUT_VARIABLE HAS_AVX)
+    if(HAS_AVX)
+       set(ISA_ARGS -DEMBREE_ISA_AVX=ON
+                    -DEMBREE_ISA_AVX2=ON
+                    -DEMBREE_ISA_AVX512=OFF
+                    -DEMBREE_ISA_SSE2=OFF
+                    -DEMBREE_ISA_SSE42=OFF
+       )
+       # order matters. link libs with increasing ISA order.
+       set(ISA_LIBS embree_avx embree_avx2)
+       set(ISA_BUILD_BYPRODUCTS "<INSTALL_DIR>/${Open3D_INSTALL_LIB_DIR}/${CMAKE_STATIC_LIBRARY_PREFIX}embree_avx${CMAKE_STATIC_LIBRARY_SUFFIX}"
+                                "<INSTALL_DIR>/${Open3D_INSTALL_LIB_DIR}/${CMAKE_STATIC_LIBRARY_PREFIX}embree_avx2${CMAKE_STATIC_LIBRARY_SUFFIX}"
+       )
+    else()
+       set(ISA_ARGS -DEMBREE_ISA_AVX=OFF
+                    -DEMBREE_ISA_AVX2=OFF
+                    -DEMBREE_ISA_AVX512=OFF
+                    -DEMBREE_ISA_SSE2=ON
+                    -DEMBREE_ISA_SSE42=ON
+       )
+       # order matters. link libs with increasing ISA order.
+       set(ISA_LIBS embree_sse42)
+       set(ISA_BUILD_BYPRODUCTS "<INSTALL_DIR>/${Open3D_INSTALL_LIB_DIR}/${CMAKE_STATIC_LIBRARY_PREFIX}embree_sse42${CMAKE_STATIC_LIBRARY_SUFFIX}" )
+    endif()
 endif()

After this change and a clean build, Open3d ran fine (no illegal instruction anymore) as well as import open3d in python!!

So it seems the problem is Open3D builds embree assuming AVX as minimum instruction set on Linux X86* systems.

I'm not very good with cmake and git pull requests, but I'll try to change cmake to add an option to disable AVX when building on Linux for old CPUs that don't support it. (btw, I saw AVX is already disabled in 3rdparty/embree/embree.cmake when building on OSX)