jetpacapp / DeepBeliefSDK

The SDK for Jetpac's iOS Deep Belief image recognition framework
Other
2.86k stars 437 forks source link

AndroidExample crashes on Android 4.2 4.3 #12

Open ziggyJ opened 10 years ago

ziggyJ commented 10 years ago

I found the crash dump in the log. It seem crash occurs when it frees the matrix in the function cblas_sgemm_fixed.

I set libc.debug.malloc to 10 and it reported rear guard mismatch for 20bytes.

E/libc ( 5319): +++ REAR GUARD MISMATCH [0, 7) E/libc ( 5319): +++ REAR GUARD MISMATCH [8, 11) E/libc ( 5319): +++ REAR GUARD MISMATCH [12, 20) E/libc ( 5319): +++ ALLOCATION 0x4dfff0c8 SIZE 92928 HAS A CORRUPTED REAR GUARD E/libc ( 5319): +++ ALLOCATION 0x4dfff0c8 SIZE 92928 ALLOCATED HERE: E/libc ( 5319): * * * * * * * * * * * * * * * * E/libc ( 5319): #00 pc 0000d7e0 /system/lib/libc_malloc_debug_leak.so (chk_malloc+0x17) E/libc ( 5319): #01 pc 0000dcc6 /system/lib/libc.so (malloc+0x9) E/libc ( 5319): #02 pc 4e224cce libjpcnn.so (cblas_sgemmfixed(int, int, int, int, int, int, float, void, float, float, int, int, float_, int, float, float*, int)+0x5d)

It seems some codes in cblas_sgemm_fixed function corrupt the allocated memory. The AndroidExample does not crash on Android 4.1 and 4.4 but I can still see Rear guard mismatch msg, so it means that the memory is still corrupted, but it is lucky that the app does not crash.

The following is the crash dump.

12-31 13:09:57.465 3701-3701/? A/libc﹕ @@@ ABORTING: LIBC: ARGUMENT IS INVALID HEAP ADDRESS IN dlfree addr=0x5121e000 12-31 13:09:57.465 3701-3701/? A/libc﹕ Fatal signal 11 (SIGSEGV) at 0xdeadbaad (code=1), thread 3701 (com.example.cam) 12-31 13:09:57.530 1850-1850/? I/DEBUG﹕ * * * * * * * * * * * * * * * * 12-31 13:09:57.530 1850-1850/? I/DEBUG﹕ Build fingerprint: 'samsung/GT-I9100/GT-I9100:4.0.3/IML74K/XXLPQ:user/release-keys' 12-31 13:09:57.530 1850-1850/? I/DEBUG﹕ Revision: '10' 12-31 13:09:57.530 1850-1850/? I/DEBUG﹕ pid: 3701, tid: 3701, name: com.example.cam >>> com.example.cam <<< 12-31 13:09:57.530 1850-1850/? I/DEBUG﹕ signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr deadbaad 12-31 13:09:57.900 1850-1850/? I/DEBUG﹕ r0 00000055 r1 bed42618 r2 00000003 r3 deadbaad 12-31 13:09:57.900 1850-1850/? I/DEBUG﹕ r4 40092228 r5 5121e000 r6 bed42640 r7 4008585a 12-31 13:09:57.900 1850-1850/? I/DEBUG﹕ r8 5121e008 r9 4f585588 sl 00000020 fp 51229588 12-31 13:09:57.900 1850-1850/? I/DEBUG﹕ ip 00000000 sp bed42640 lr 40071c69 pc 4005600c cpsr 00010030 12-31 13:09:57.900 1850-1850/? I/DEBUG﹕ d0 c1790e403eaf0c38 d1 41e4dcdfc1b49f9e 12-31 13:09:57.900 1850-1850/? I/DEBUG﹕ d2 0000000000000000 d3 0000000000000000 12-31 13:09:57.900 1850-1850/? I/DEBUG﹕ d4 bd22a1283c5bada0 d5 3ab6f0003c033c40 12-31 13:09:57.900 1850-1850/? I/DEBUG﹕ d6 8000000000000000 d7 3f80000000000000 12-31 13:09:57.900 1850-1850/? I/DEBUG﹕ d8 bec96d47bec96d47 d9 bec96d47bec96d47 12-31 13:09:57.900 1850-1850/? I/DEBUG﹕ d10 3750324e3750324e d11 3750324e3750324e 12-31 13:09:57.900 1850-1850/? I/DEBUG﹕ d12 0000000000000000 d13 0000000000000000 12-31 13:09:57.905 1850-1850/? I/DEBUG﹕ d14 0000000000000000 d15 0000000000000000 12-31 13:09:57.905 1850-1850/? I/DEBUG﹕ d16 42253b40c209c66b d17 c367fa72c2e77db0 12-31 13:09:57.910 1850-1850/? I/DEBUG﹕ d18 0000000000000000 d19 0000000000000000 12-31 13:09:57.910 1850-1850/? I/DEBUG﹕ d20 0000000000000000 d21 0000000000000000 12-31 13:09:57.910 1850-1850/? I/DEBUG﹕ d22 419033abc203c170 d23 c10d9a20c198f3cc 12-31 13:09:57.910 1850-1850/? I/DEBUG﹕ d24 3d88e5243d10bc18 d25 bda932d03d97ea44 12-31 13:09:57.910 1850-1850/? I/DEBUG﹕ d26 3f8000003f800000 d27 3f8000003f800000 12-31 13:09:57.910 1850-1850/? I/DEBUG﹕ d28 4051a513c03de1bc d29 41694fcd40c6fd80 12-31 13:09:57.910 1850-1850/? I/DEBUG﹕ d30 c1790e403eaf0c38 d31 41e4dcdfc1b49f9e 12-31 13:09:57.910 1850-1850/? I/DEBUG﹕ scr 60000010 12-31 13:09:57.925 1850-1850/? I/DEBUG﹕ backtrace: 12-31 13:09:57.925 1850-1850/? I/DEBUG﹕ #00 pc 0000f00c /system/lib/libc.so 12-31 13:09:57.925 1850-1850/? I/DEBUG﹕ #01 pc 00011db3 /system/lib/libc.so (dlfree+1458) 12-31 13:09:57.925 1850-1850/? I/DEBUG﹕ #02 pc 0000cf73 /system/lib/libc.so (free+10) 12-31 13:09:57.925 1850-1850/? I/DEBUG﹕ #03 pc 0000fe5b /data/app-lib/com.example.cam-1/libjpcnn.so (cblas_sgemmfixed(int, int, int, int, int, int, float, void, float, float, int, int, float, int, float, float, int)+490) 12-31 13:09:57.925 1850-1850/? I/DEBUG﹕ #04 pc 0000d61b /data/app-lib/com.example.cam-1/libjpcnn.so (matrixcorrelate(Buffer, Buffer*, int, int, int, bool)+1514)

ziggyJ commented 10 years ago

I removed -O3 flag in Android.mk file, and now it reports

F/libc ( 4776): ./src/lib/math/matrix_gemm.cpp:424: void cblas_sgemm_fixed(int, int, int, int, int, int, jpfloatt, void, jpfloat_t, jpfloat_t, int, int, jpfloatt, int, jpfloat_t, jpfloat_t*, int): assertion "(k % 8) == 0" failed F/libc ( 4776): Fatal signal 6 (SIGABRT) at 0x000012a8 (code=-6), thread 4776 (com.example.cam)

ziggyJ commented 10 years ago

more logs, it seems k=363 that cause the crash.

I/stderr ( 9110): matrix_correlate[GEMM](input=[Buffer prepareInput_output - %281, 224, 224, 3%29, 32 bits per element, range %280.000000-1.000000%29], kernels=[Buffer None - %2896, 363%29, 16 bits per element, range %28-0.393412-0.419856%29], kernelWidth=11, kernelCount=96, stride=4) I/stderr ( 9110): patches_into_rows(input=[Buffer prepareInput_output - (1, 224, 224, 3), 32 bits per element, range (0.000000-1.000000)], kernelWidth=11, stride=4) I/stderr ( 9110): patches_into_rows() result=[Buffer None - (1, 3025, 363), 32 bits per element, range (0.000000-1.000000)]

petewarden commented 10 years ago

I believe the cause is likely the NEON optimized code running past the end of the buffer on 'odd-sized' matrices. I've had luck on other platforms with Eigen's GEMM as an alternative, I will take a look at this again once I have an Android device handy, but you could try compiling with that instead.

ziggyJ commented 10 years ago

Do you mean that I need to remove -DUSE_NEON flag? It seems that the current android build has already used Eigen's GEMM. I can see LOCAL_CFLAGS := -DUSE_EIGEN_GEMM -DUSE_NEON in Android.mk

petewarden commented 10 years ago

From memory I believe so, but I don't have an Android dev environment handy to try it to be sure.

ziggyJ commented 10 years ago

I remove -DUSE_NEON and it works! Just a bit slower (~20% I think) than enable NEON optimized code.

Thank you for the help.