opencv / opencv_contrib

Repository for OpenCV's extra modules
Apache License 2.0
9.37k stars 5.75k forks source link

SEGFAULT in cv::hal_AVX op_add #1909

Open AronRubin opened 5 years ago

AronRubin commented 5 years ago

0 cv::hal_AVX2::v256_load_aligned (ptr=0x540afa40)

at ../modules/core/include/opencv2/core/hal/intrin_avx.hpp:376

1 0x0000000001fddb30 in cv::hal_AVX2::simd256::vx_load_aligned (

ptr=0x540afa40) at ../modules/core/include/opencv2/core/hal/intrin.hpp:342

2 0x0000000001f3a6a4 in cv::hal::opt_AVX2::bin_loader<cv::hal::opt_AVX2::op_add, double, cv::hal_AVX2::v_float64x4>::la (src1=0x540afa40, src2=0x5720cea0,

dst=0x542dd340)
at C:/Users/arubin/git/opencv/modules/core/src/arithm.simd.hpp:344

3 0x0000000001f5bb91 in cv::hal::opt_AVX2::bin_loop<cv::hal::opt_AVX2::op_add, double, cv::hal_AVX2::v_float64x4> (src1=0x540afa40, step1=0,

src2=0x5720cea0, step2=0, dst=0x542dd340, step=0, width=128, height=0)
at C:/Users/arubin/git/opencv/modules/core/src/arithm.simd.hpp:417

4 0x0000000001f4b77f in cv::hal::opt_AVX2::add64f (src1=0x540afa40,

step1=1, src2=0x5720cea0, step2=1, dst=0x542dd340, step=1, width=128,
height=1)
at C:/Users/arubin/git/opencv/modules/core/src/arithm.simd.hpp:535

5 0x0000000001f2fb37 in cv::hal::add64f (src1=0x540afa40, step1=1,

src2=0x5720cea0, step2=1, dst=0x542dd340, step=1, width=128, height=1)
at ../modules/core/src/arithm.simd.hpp:535

6 0x000000000207538c in cv::arithm_op (_src1=..., _src2=..., _dst=...,

_mask=..., dtype=14, tab=0x21c1ea0 <cv::getAddTab()::addTab>,
muldiv=false, usrdata=0x0, oclop=0) at ../modules/core/src/arithm.cpp:857

7 0x0000000001ecdd5c in cv::add (src1=..., src2=..., dst=..., mask=...,

dtype=-1) at ../modules/core/src/arithm.cpp:930

8 0x00000000020e399c in cv::MatOp_AddEx::assign (

this=0x21b9450 <cv::g_MatOp_AddEx>, e=..., m=..., _type=-1)
at ../modules/core/src/matrix_expressions.cpp:1251

9 0x00000000058b2c15 in cv::MatExpr::operator cv::Mat (this=0x175eb30)

at C:/Users/arubin/git/opencv/modules/core/include/opencv2/core/mat.inl.hpp:3416

10 0x0000000005889b00 in cv::face::FacemarkLBFImpl::fitImpl (

this=0x447c4ff0, image=..., landmarks=...)
at C:/Users/arubin/git/opencv_contrib/modules/face/src/facemarkLBF.cpp:432

11 0x0000000005889485 in cv::face::FacemarkLBFImpl::fit (this=0x447c4ff0,

image=..., roi=..., _landmarks=...)
at C:/Users/arubin/git/opencv_contrib/modules/face/src/facemarkLBF.cpp:386

Where src1 is a Mat (originally a UMat) $3 = {flags = 1124024334, dims = 2, rows = 68, cols = 1, data = 0x540afa40 "X3\023p▒▒L@r▒HVp▒f@*\b8/ZN@yh▒▒R\200l@▒\003\064#i▒P@x/0=e\016q@&n▒▒▒HS@▒▒\023▒▒\024t@X7>▒▒qX@:6▒ph▒v@br\032\064▒I@", datastart = 0x540afa40 "X3\023p▒▒L@r▒HVp▒f@*\b8/ZN@yh▒▒R\200l@▒\003\064#i▒P@x/0=e\016q@&n▒▒▒HS@▒▒\023▒▒\024t@X7>▒▒qX@:6▒ph▒v@br\032\064▒I@", dataend = 0x540afe80 "651125,3hXX\r▒▒\022", datalimit = 0x540afe80 "651125,3hXX\r▒▒\022", allocator = 0x0, u = 0x56270cf0, size = {p = 0x175ddc8}, step = {p = 0x175de10, buf = {16, 16}}}

And src2 is a Scalar/Matx $4 = {flags = 1124024326, dims = 2, rows = 4, cols = 1, data = 0x175ec70 "", datastart = 0x175ec70 "", dataend = 0x175ec90 "", datalimit = 0x175ec90 "", allocator = 0x0, u = 0x0, size = {p = 0x175dd68}, step = {p = 0x175ddb0, buf = {8, 8}}}

System information follows: $ opencv_version_win32d

General configuration for OpenCV 4.0.0-pre ===================================== Version control: 4.0.0-rc-25-gf81370232-dirty

Extra modules: Location (extra): C:/Users/arubin/git/opencv_contrib/modules Version control (extra): 4.0.0-rc-2-gec4d5c85-dirty

Platform: Timestamp: 2018-09-14T16:15:53Z Host: Windows 10.0.17134 AMD64 CMake: 3.13.0-rc3 CMake generator: Ninja CMake build tool: C:/msys64/mingw64/bin/ninja.exe Configuration: Debug

CPU/HW features: Baseline: SSE SSE2 SSE3 requested: SSE3 Dispatched code generation: SSE4_1 SSE4_2 FP16 AVX AVX2 requested: SSE4_1 SSE4_2 AVX FP16 AVX2 AVX512_SKX SSE4_1 (5 files): + SSSE3 SSE4_1 SSE4_2 (1 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 (0 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 AVX AVX (4 files): + SSSE3 SSE4_1 POPCNT SSE4_2 AVX AVX2 (11 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2

C/C++: Built as dynamic libs?: YES C++ Compiler: C:/msys64/mingw64/bin/g++.exe (ver 8.2.0) C++ flags (Release): -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Winit-self -Wno-narrowing -Wno-delete-non-virtual-dtor -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -fomit-frame-pointer -ffast-math -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -fvisibility-inlines-hidden -O3 -DNDEBUG -DNDEBUG -g1 C++ flags (Debug): -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Winit-self -Wno-narrowing -Wno-delete-non-virtual-dtor -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -fomit-frame-pointer -ffast-math -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -fvisibility-inlines-hidden -g -O0 -DDEBUG -D_DEBUG C Compiler: C:/msys64/mingw64/bin/gcc.exe C flags (Release): -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Winit-self -Wno-narrowing -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -fomit-frame-pointer -ffast-math -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -O3 -DNDEBUG -DNDEBUG -g1 C flags (Debug): -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Winit-self -Wno-narrowing -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -fomit-frame-pointer -ffast-math -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -g -O0 -DDEBUG -D_DEBUG Linker flags (Release): -Wl,--gc-sections Linker flags (Debug): -Wl,--gc-sections ccache: NO Precompiled headers: YES Extra dependencies: opengl32 glu32 C:/msys64/opt/halide/lib/libHalide.a z 3rdparty dependencies:

OpenCV modules: To be built: aruco bgsegm bioinspired calib3d ccalib core cvv datasets dnn dnn_objdetect dpm face features2d flann fuzzy gapi hdf hfs highgui img_hash imgcodecs imgproc java_bindings_generator line_descriptor ml objdetect optflow phase_unwrapping photo plot reg rgbd saliency shape stereo stitching structured_light superres surface_matching text tracking ts video videoio videostab viz xfeatures2d ximgproc xobjdetect xphoto Disabled: ovis python3 python_bindings_generator world Disabled by dependency: - Unavailable: cnn_3dobj cudaarithm cudabgsegm cudacodec cudafeatures2d cudafilters cudaimgproc cudalegacy cudaobjdetect cudaoptflow cudastereo cudawarping cudev freetype java js matlab python2 python2 sfm Applications: apps Documentation: NO Non-free algorithms: YES

Windows RT support: NO

GUI: QT: YES (ver 5.11.2) QT OpenGL support: YES (Qt5::OpenGL 5.11.2) Win32 UI: YES OpenGL support: YES (opengl32 glu32) VTK support: YES (ver 8.1.1)

Media I/O: ZLib: build (ver 1.2.11) JPEG: build-libjpeg-turbo (ver 1.5.3-62) WEBP: build (ver encoder: 0x020e) PNG: build (ver 1.6.35) TIFF: build (ver 42 - 4.0.9) JPEG 2000: build (ver 1.900.1) OpenEXR: build (ver 1.7.1) HDR: YES SUNRASTER: YES PXM: YES PFM: YES

Video I/O: DC1394: NO FFMPEG: YES (prebuilt binaries) avcodec: YES (ver 58.35.100) avformat: YES (ver 58.20.100) avutil: YES (ver 56.22.100) swscale: YES (ver 5.3.100) avresample: YES (ver 4.0.0) GStreamer: base: YES (ver 1.0) video: YES (ver 1.0) app: YES (ver 1.0) riff: YES (ver 1.0) pbutils: YES (ver 1.0) DirectShow: YES

Parallel framework: none

Trace: YES (built-in)

Other third-party libraries: Lapack: YES (C:/msys64/mingw64/lib/libopenblas.dll.a) Halide: YES (C:/msys64/opt/halide/lib/libHalide.a C:/msys64/opt/halide/include) Eigen: YES (ver 3.3.5) Custom HAL: NO Protobuf: build (3.5.1)

OpenCL: YES (SVM) Include path: C:/Users/arubin/git/opencv/3rdparty/include/opencl/1.2 Link libraries: Dynamic load

Python (for build): C:/msys64/mingw64/bin/python2.7.exe

Java: ant: NO JNI: C:/Program Files/Java/jdk1.8.0_161/include C:/Program Files/Java/jdk1.8.0_161/include/win32 C:/Program Files/Java/jdk1.8.0_161/include Java wrappers: NO Java tests: NO

Install to: C:/msys64/mingw64

[ INFO:0] Initialize OpenCL runtime... OpenCL Platforms: Intel(R) OpenCL iGPU: Intel(R) HD Graphics 630 (OpenCL 2.1 NEO ) CPU: Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz (OpenCL 2.1 (Build 611)) NVIDIA CUDA dGPU: GeForce GTX 1070 (OpenCL 1.2 CUDA) Current OpenCL device: Type = iGPU Name = Intel(R) HD Graphics 630 Version = OpenCL 2.1 NEO Driver version = 23.20.16.4973 Address bits = 64 Compute units = 24 Max work group size = 256 Local memory size = 64 KB Max memory allocation size = 3 GB 179 MB 418 KB Double support = Yes Host unified memory = Yes Device extensions: cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_depth_images cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_image2d_from_buffer cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_media_block_io cl_intel_driver_diagnostics cl_intel_device_side_avc_motion_estimation cl_khr_priority_hints cl_khr_subgroups cl_khr_il_program cl_khr_fp64 cl_intel_planar_yuv cl_intel_packed_yuv cl_intel_motion_estimation cl_intel_advanced_motion_estimation cl_khr_gl_sharing cl_khr_gl_depth_images cl_khr_gl_event cl_khr_gl_msaa_sharing cl_intel_dx9_media_sharing cl_khr_dx9_media_sharing cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_intel_d3d11_nv12_media_sharing cl_intel_simultaneous_sharing Has AMD Blas = No Has AMD Fft = No Preferred vector width char = 16 Preferred vector width short = 8 Preferred vector width int = 4 Preferred vector width long = 1 Preferred vector width float = 1 Preferred vector width double = 1 OpenCV's HW features list: ID= 1 (MMX) -> ON ID= 2 (SSE) -> ON ID= 3 (SSE2) -> ON ID= 4 (SSE3) -> ON ID= 5 (SSSE3) -> ON ID= 6 (SSE4.1) -> ON ID= 7 (SSE4.2) -> ON ID= 8 (POPCNT) -> ON ID= 9 (FP16) -> ON ID= 10 (AVX) -> ON ID= 11 (AVX2) -> ON ID= 12 (FMA3) -> ON Total available: 12

berak commented 5 years ago

@AronRubin , hello to a fellow mingw user ;) it seems, that the avx2 instructions used in the opencv code don't compile correctly, using mingw64.

i'm still using 7.2.0, you're on 8.2.0, things might have improved in the meantime, but i had to redact the cmake CPU_DISPATCH option to an empty list, to compile all of the opencv + contrib modules properly.

(also remember , devs gave up on mingw way back after opencv 2.4.6. -- there were far too many different versions of it to sustainably support that, still true now, sad as it is ..)