rgbd: segmentation fault on 32bit Arm platforms

tomoaki0705 commented 3 years ago

System information (version)

OpenCV => recent master (674ecc5) + contrib (0def4736191800fd7ab67550b7126dc2ca5871ef)
Operating System / Platform => Raspberry Pi 3, ODROID-XU4
Compiler => GCC 5.4.0, GCC 6.3.0

Detailed description

Test opencv_test_rgbd fails with Segmentation fault

$ OPENCV_OPENCL_DEVICE=disabled GTEST_FILTER=TSDF*:KinectFusion* ./bin/opencv_test_rgbd
CTEST_FULL_OUTPUT
OpenCV version: 4.5.1-dev
OpenCV VCS version: 4.5.1-159-g674ecc5
Build type: Debug
Compiler: /usr/bin/c++  (ver 5.4.0)
Parallel framework: pthreads (nthreads=8)
CPU features: NEON FP16
[ INFO:0] global /opencv-fork/modules/core/src/ocl.cpp (1176) haveOpenCL Initialize OpenCL runtime...
[ INFO:0] global /opencv-fork/modules/core/src/ocl.cpp (1182) haveOpenCL OpenCL: found 1 platforms
[ INFO:0] global /opencv-fork/modules/core/src/ocl.cpp (974) getInitializedExecutionContext OpenCL: initializing thread execution context
[ INFO:0] global /opencv-fork/modules/core/src/ocl.cpp (984) getInitializedExecutionContext OpenCL: creating new execution context...
[ INFO:0] global /opencv-fork/modules/core/src/ocl.cpp (1012) getInitializedExecutionContext OpenCL: context is not available/disabled
OpenCL is disabled
TEST: Skip tests with tags: 'mem_2gb', 'verylong', 'debug_verylong'
Note: Google Test filter = TSDF*:KinectFusion*
[==========] Running 8 tests from 2 test cases.
[----------] Global test environment set-up.
[----------] 4 tests from KinectFusion
[ RUN      ] KinectFusion.lowDense
Segmentation fault

I could see that this was NOT happening on Aarch64 platforms, but only on Arm 32bit platforms.

Tracing with GDB, the access violation was happening here

https://github.com/opencv/opencv_contrib/blob/0def4736191800fd7ab67550b7126dc2ca5871ef/modules/rgbd/src/tsdf.cpp#L267-L268

The index was sometime negative value, sometime larger than the size of volData Tracing back, the index was decided by the coordinate ix, iy and iz

https://github.com/opencv/opencv_contrib/blob/0def4736191800fd7ab67550b7126dc2ca5871ef/modules/rgbd/src/tsdf.cpp#L256

And I could confirm that sometimes, one of them were near the boundary. This function is checking the difference between the neighbors, so the boundary has to be checked strictly. It was done at the beginning of the function

https://github.com/opencv/opencv_contrib/blob/0def4736191800fd7ab67550b7126dc2ca5871ef/modules/rgbd/src/tsdf.cpp#L234-L238

The comparison will return 0xffffffff for true or 0 for false. If an index is pointing outside of the boundary, addition between 0xffffffff and 0 happens. This type is v_float32x4 so the addition happens in floating-point arithmetic, which is -Inf + 0. This results in 0xffffffff on Aarch64 platforms, so taking v_check_any which checks the sign bit of each element, will grab if the comparison was true or not, correctly. On Arm 32bit, the result of -Inf + 0 becomes NaN, which is 0x7fc00000 Taking v_check_any of this will always end up in 0, regardless the result of comparison. As far as I can see, this behavior seems standard (expected) behavior on Arm 32bit platform.

Now, changing the checking part as following let the test pass (i.e. taking or of each v_check_any, not taking v_check_any of add)

-    if(v_check_any((p < v_float32x4(1.f, 1.f, 1.f, 0.f)) +
-                   (p >= v_float32x4((float)(volResolution.x-2),
+    if(v_check_any (p < v_float32x4(1.f, 1.f, 1.f, 0.f)) ||
+       v_check_any (p >= v_float32x4((float)(volResolution.x-2),
                                      (float)(volResolution.y-2),
                                      (float)(volResolution.z-2), 1.f))
-                   ))
+                   )

Test on Raspberry Pi after the modification

$ GTEST_FILTER=KinectFusion*:TSDF.* ./bin/opencv_test_rgbd
CTEST_FULL_OUTPUT
OpenCV version: 4.5.1-dev
OpenCV VCS version: 4.5.1-159-g674ecc5
Build type: Release
Compiler: /usr/bin/c++  (ver 6.3.0)
Parallel framework: pthreads (nthreads=4)
CPU features: NEON FP16
OpenCL is disabled
TEST: Skip tests with tags: 'mem_2gb', 'verylong'
Note: Google Test filter = KinectFusion*:TSDF.*
[==========] Running 8 tests from 2 test cases.
[----------] Global test environment set-up.
[----------] 4 tests from KinectFusion
[ RUN      ] KinectFusion.lowDense
[       OK ] KinectFusion.lowDense (22234 ms)
[ RUN      ] KinectFusion.highDense
[       OK ] KinectFusion.highDense (63894 ms)
[ RUN      ] KinectFusion.inequal
[       OK ] KinectFusion.inequal (21728 ms)
[ RUN      ] KinectFusion.OCL
[       OK ] KinectFusion.OCL (44466 ms)
[----------] 4 tests from KinectFusion (152322 ms total)

[----------] 4 tests from TSDF
[ RUN      ] TSDF.raycast_normals
[       OK ] TSDF.raycast_normals (1082 ms)
[ RUN      ] TSDF.fetch_points_normals
[       OK ] TSDF.fetch_points_normals (572 ms)
[ RUN      ] TSDF.fetch_normals
[       OK ] TSDF.fetch_normals (571 ms)
[ RUN      ] TSDF.valid_points
[       OK ] TSDF.valid_points (918 ms)
[----------] 4 tests from TSDF (3144 ms total)

[----------] Global test environment tear-down
[==========] 8 tests from 2 test cases ran. (155467 ms total)
[  PASSED  ] 8 tests.

I'll send a patch shortly

Steps to reproduce

Run opencv_test_rgbd on Arm 32bit platform

Issue submission checklist

[x] I report the issue, it's not a question
[x] I checked the problem with documentation, FAQ, open issues, answers.opencv.org, Stack Overflow, etc and have not found solution
[x] I updated to latest OpenCV version and the issue is still there
[x] There is reproducer code and related data files: videos, images, onnx, etc

savuor commented 3 years ago

Yes, a result of register comparison expected to be an integer register, so reinterpreting comparison as integer could also help. By the way, 0xff ff ff ff is NaN (to be correct, -NaN), not -Inf. Is it true that there'e an implicit conversion between different NaNs in calculations?

tomoaki0705 commented 3 years ago

Thanks for the review @savuor

Yes, 0xffffffff is -NaN, not -Inf. Thanks for pointing.
Technically, it's not a conversion. On Arm 32bit platform, any computation which becomes NaN ends up in 0x7fc00000

unsigned int inputZero[] = {0, 0, 0, 0};
unsigned int inputNaN[] = {0xffffffff, 0x7fc00000, 0x7fc00001, 0x7fffffff};
unsigned int resultSIMD[4];
float32x4_t a = vld1q_f32((float*)inputZero);
float32x4_t b = vld1q_f32((float*)inputNaN);
vst1q_f32((float*)resultSIMD, vaddq_f32(a, b));
for(int i = 0;i < 4;i++)
{
    std::cout << std::hex << "0x" << resultSIMD[i] << std::endl;
}

ends up in

0x7fc00000
0x7fc00000
0x7fc00000
0x7fc00000

Same code becomes as below in 64bit

0xffffffff
0x7fc00000
0x7fc00001
0x7fffffff

This behavior is exactly same using NEON or w/o NEON. I think there was a rules for this NaN + 0 in IEEE 754, so I feel either platform is breaking this standard, but I didn't dig any deeper.

opencv / opencv_contrib