projectNe10 / Ne10

An open optimized software library project for the ARM® Architecture
Other
1.46k stars 408 forks source link

When I integrated into My Android Project, there was an Error. #167

Open KevinKingZhan opened 7 years ago

KevinKingZhan commented 7 years ago

Hi, Joe Savage. I intergrated Ne10 into My Android Project, there was an Error Occured. The Log is 2017-06-05 4 20 29 This Crash is not happened every time. it is happened by Accident. I don't have any ideas. Please help me, Thank you!

lieff commented 7 years ago

Stack seems corrupted. Or fft routine called from signal handler?

KevinKingZhan commented 7 years ago

I use Compile Flag :-mfloat-abi=hard -mfpu=vfp3 in My Build shared Library. is this flag impact some neonintrinsic Function(ex:vmlaq_f32, vst2q_f32 etc.)?

lieff commented 7 years ago

mfloat-abi impact float parameters passing between functions, mfpu impact code generation. I usually use -mfpu=neon -ffast-math -fvectorize for compiler neon code generation. Also I do not see any *_dmul runtime functions calling from ne10, NE10_fft_generic_float32.neonintrinsic.cpp.o only imports

         U __aeabi_idiv
         U __aeabi_unwind_cpp_pr0

in my case. Here how I build ne10:

mkdir build && cd build
export ANDROID_NDK=~/Android/Sdk/ndk-bundle
export NE10_ANDROID_TARGET_ARCH=armv7
cmake -DCMAKE_TOOLCHAIN_FILE=../android/android_config.cmake ..
make
KevinKingZhan commented 7 years ago

Thank you lieff, I Use Compile Flag : -mfloat-abi=softfp -mfpu=neon And The Error was happened by Accident.

The Method Stack Trace is :

0  libsongstudio.so!ne10_mixed_radix_fft_forward_float32_neon + 0x5d
     r0 = 0x8f826118    r1 = 0x94438000    r2 = 0x00000002    r3 = 0x8f81a118
     r4 = 0x00000006    r5 = 0x00000400    r6 = 0x00000004    r7 = 0x9443a010
     r8 = 0x9443c010    r9 = 0x9443e010   r10 = 0x94440000   r12 = 0x00000004
     fp = 0x8f82e0d8    sp = 0x9483bad8    lr = 0x8f826118    pc = 0x9c78549e
    Found by: given as instruction pointer in context
 1  libsongstudio.so!ne10_fft_c2c_1d_float32_neon [NE10_fft_float32.neon.c : 748 + 0xf]
     sp = 0x9483bb40    pc = 0x9c766913
    Found by: stack scanning
 2  libsongstudio.so!ne10_fft_r2c_1d_float32_neon [NE10_fft_float32.neon.c : 788 + 0xb]
     r4 = 0x8f826118    r5 = 0x94428000    r6 = 0x8f81a000    r7 = 0x8f81a018
     r8 = 0x94438000    sp = 0x9483bbf0    pc = 0x9c766995
    Found by: call frame info
 3  libsongstudio.so!NeonFFTAccelerate::fft_forward [neon_fft_accelerate.cpp : 34 + 0xd]
     r4 = 0x9845eb20    r5 = 0x9cb1bb2c    r6 = 0x00002000    r7 = 0x8784a018
     r8 = 0x87852018    r9 = 0x87866000   r10 = 0x9483dcc8    sp = 0x9483bc70
     pc = 0x9c750687
    Found by: call frame info

What should I do ? Please Help me, Thank you very much.

lieff commented 7 years ago

I have following assumptions: 1. input\output data not float aligned (unlikely, I do not see any register that looks like address and not 8 bytes aligned) 2. input\output buffers too small, note that ne10 uses complex fft length 3. your CPU do not support neon, so it crashed on neon instruction. Also you can build part of code on linux\windows and check it using valgrind or drmemory.

KevinKingZhan commented 7 years ago

Yeah, lieff. The same code is running on the same device, and most of the time it is normal, and occasionally the Error Occured. So The device is support neon. And nfft Size is Larger than 3. My Code is :

NeonFFTAccelerate::NeonFFTAccelerate(size_t nfft):FftRoutine(nfft) {
    if (ne10_init() != NE10_OK) {
            LOGE("Failed to initialise Ne10.");
    }   
    m_nfft = nfft;
    cfg = ne10_fft_alloc_r2c_float32(nfft);
    in = (ne10_float32_t*) NE10_MALLOC (nfft * sizeof (ne10_float32_t));
    out = (ne10_fft_cpx_float32_t*) NE10_MALLOC (nfft * sizeof (ne10_fft_cpx_float32_t));
}

NeonFFTAccelerate::~NeonFFTAccelerate() {
    NE10_FREE(in);
    NE10_FREE(out);
    NE10_FREE(cfg);
}

void NeonFFTAccelerate::fft_forward(float* input, float* output_re, float* output_im) {
    memcpy(in, input, sizeof(float) * m_nfft);
    ne10_fft_r2c_1d_float32(out, in, cfg);
    for (int i = 0; i < m_nfft / 2; i++) {
        output_re[i] = out[i].r;
        output_im[i] = out[i].i;
    }   
}

The fft size nfft is 2048 or 8192. is The attribute cfg must create in fft_forward method? is problem exist in My Code? Thank you very much! @lieff

lieff commented 7 years ago

Hmm, I do not see any problem, cfg can be created in constructor. May be memory corruption in some other place. Crash at beginning ne10_mixed_radix_fft_forward_float32_neon, it can happen if it reads invalid values from cfg factors[] buffer. This buffer used in previous function, so use-after-free and parallel destructor call is less probability.

KevinKingZhan commented 7 years ago

OK, I will try alloc cfg, IN, OUT In every fft_forward call. Thank you lieff.

I will leave message When I Finished All Test-Case. Thank you! @lieff

KevinKingZhan commented 7 years ago

Hi, lieff. Because of We think cfg may be updated by other thread. So I modifyed Code :

NeonFFTAccelerate::NeonFFTAccelerate(size_t nfft):FftRoutine(nfft) {
    if (ne10_init() != NE10_OK) {
            LOGE("Failed to initialise Ne10.");
    }
}

NeonFFTAccelerate::~NeonFFTAccelerate() {
}

void NeonFFTAccelerate::fft_forward(float* input, float* output_re, float* output_im) {
    ne10_fft_r2c_cfg_float32_t cfg = ne10_fft_alloc_r2c_float32(m_nfft);
    ne10_float32_t* in = (ne10_float32_t*) NE10_MALLOC (m_nfft * sizeof (ne10_float32_t));
    ne10_fft_cpx_float32_t* out = (ne10_fft_cpx_float32_t*) NE10_MALLOC (m_nfft * sizeof (ne10_fft_cpx_float32_t));
    memcpy(in, input, sizeof(float) * m_nfft);
    ne10_fft_r2c_1d_float32(out, in, cfg);
    for (int i = 0; i < m_nfft / 2; i++) {
        output_re[i] = out[i].r;
        output_im[i] = out[i].i;
    }
    NE10_FREE(in);
    NE10_FREE(out);
    NE10_FREE(cfg);
}

void NeonFFTAccelerate::fft_inverse(float* input_re, float* input_im, float* output) {
    ne10_fft_r2c_cfg_float32_t cfg = ne10_fft_alloc_r2c_float32(m_nfft);
    ne10_float32_t* in = (ne10_float32_t*) NE10_MALLOC (m_nfft * sizeof (ne10_float32_t));
    ne10_fft_cpx_float32_t* out = (ne10_fft_cpx_float32_t*) NE10_MALLOC (m_nfft * sizeof (ne10_fft_cpx_float32_t));
    for (int i = 0; i < m_nfft / 2; i++) {
        out[i].r = input_re[i];
        out[i].i = input_im[i];
    }
    ne10_fft_c2r_1d_float32(in, out, cfg);
    memcpy(output, in, sizeof(float) * m_nfft);
    NE10_FREE(in);
    NE10_FREE(out);
    NE10_FREE(cfg);
}

And The Crash has happened by Accident. The log is :

Operating system: Android
                  0.0.0 Linux 3.18.31-perf-g0bf156d-00671-gcef62fe #1 SMP PREEMPT Thu Apr 6 20:31:34 CST 2017 armv8l
CPU: arm
     ARMv1 Qualcomm part(0x51002150) features: half,thumb,fastmult,vfpv2,edsp,neon,vfpv3,tls,vfpv4,idiva,idivt
     4 CPUs

Crash reason:  SIGSEGV
Crash address: 0xb8680000
Process uptime: not available

Thread 0 (crashed)
 0  libsongstudio.so!ne10_mixed_radix_fft_forward_float32_neon + 0x5d
     r0 = 0xb81d1798    r1 = 0xb867e000    r2 = 0x00000002    r3 = 0xb81ce798
     r4 = 0x00000005    r5 = 0x00000100    r6 = 0x00000004    r7 = 0xb867e810
     r8 = 0xb867f010    r9 = 0xb867f810   r10 = 0xb8680000   r12 = 0x00000004
     fp = 0xb81d3758    sp = 0xffaa9c50    lr = 0xb81d1798    pc = 0xc86d544e
    Found by: given as instruction pointer in context
 1  dalvik-main space 1 (deleted) + 0xd1ffffe
     sp = 0xffaa9c58    pc = 0x3fe00000
    Found by: stack scanning
 2  dalvik-main space 1 (deleted) + 0x147b7ffe
     sp = 0xffaa9c64    pc = 0x473b8000
    Found by: stack scanning
 3  dalvik-main space 1 (deleted) + 0x123fdffe
     sp = 0xffaa9c68    pc = 0x44ffe000
    Found by: stack scanning
 4  dalvik-main space 1 (deleted) + 0xe890fd9
     sp = 0xffaa9c6c    pc = 0x41490fdb
    Found by: stack scanning
 5  libsongstudio.so!ne10_fft_c2c_1d_float32_neon [NE10_fft_float32.neon.c : 748 + 0x9]
     sp = 0xffaa9cb8    pc = 0xc86b68fb
    Found by: stack scanning
 6  libsongstudio.so!ne10_fft_r2c_1d_float32_neon [NE10_fft_float32.neon.c : 786 + 0x3]
     r4 = 0xb81d1798    r5 = 0xb849f840    r6 = 0xb81ce798    r7 = 0xb81ce698
     r8 = 0xb81ce680    sp = 0xffaa9d68    pc = 0xc86b6953
    Found by: call frame info
 7  libsongstudio.so!NeonFFTAccelerate::fft_forward [neon_fft_accelerate.cpp : 35 + 0xd]
     r4 = 0xc86b6921    r5 = 0x00000800    r6 = 0xb80f5000    r7 = 0xb95629a0
     r8 = 0xb849f840    r9 = 0xb867e000   r10 = 0xb81ce680    sp = 0xffaa9de8
     pc = 0xc86a0655
    Found by: call frame info

And My Test Device is Xiaomi5 with Andorid 7.0 System.

BTW: For Get the correct stack trace, So I compiled Ne10 source to My project. I have Found The NE10_fft_float32.neon.c File Called NE10_fft_float32.neon.s. So I have not Compiled NE10_fft_float32.neonintrinsic.c file. Is My Compiled Process is right? Thank you very much. @lieff

lieff commented 7 years ago

Yes .s file used in my case. Here ne10_mixed_radix_fft_forward_float32_neon + 0x5d crash place:

.text:00000010                 EXPORT ne10_mixed_radix_fft_forward_float32_neon
.text:00000010 ne10_mixed_radix_fft_forward_float32_neon
.text:00000010
.text:00000010 arg_0           =  0
.text:00000010
.text:00000010                 PUSH.W          {R4-R12,LR}
.text:00000014                 VPUSH           {D8-D15}
.text:00000018                 LDR             R4, [R2]
.text:0000001A                 LDR             R5, [R2,#4]
.text:0000001C                 ADD.W           R2, R2, R4,LSL#3
.text:00000020                 LDR.W           R12, [R2]
.text:00000024                 LDR.W           R6, [R2,#-4]
.text:00000028                 MOV             LR, R0
.text:0000002A                 CMP.W           R12, #8
.text:0000002E                 BEQ             loc_BC
.text:00000030                 MOV             R2, R5
.text:00000032                 MOV             R7, R1
.text:00000034                 MOV             R11, R0
.text:00000036                 ADD.W           R9, R7, R5,LSL#4
.text:0000003A                 ADD.W           R8, R7, R5,LSL#3
.text:0000003E                 ADD.W           R10, R9, R5,LSL#3
.text:00000042                 VLD2.32         {D0-D1}, [R7@64]!
.text:00000046                 VLD2.32         {D4-D5}, [R9@64]!
.text:0000004A                 VLD2.32         {D2-D3}, [R8@64]!
.text:0000004E                 VLD2.32         {D6-D7}, [R10@64]!
.text:00000052
.text:00000052 loc_52                                  ; CODE XREF: ne10_mixed_radix_fft_forward_float32_neon+94j
.text:00000052                 VADD.F32        Q4, Q0, Q2
.text:00000056                 VSUB.F32        Q5, Q0, Q2
.text:0000005A                 VLD2.32         {D0-D1}, [R7@64]!
.text:0000005E                 VLD2.32         {D4-D5}, [R9@64]!
.text:00000062                 VADD.F32        Q6, Q1, Q3
.text:00000066                 VSUB.F32        Q7, Q1, Q3
.text:0000006A                 VLD2.32         {D2-D3}, [R8@64]!  <<
.text:0000006E                 VLD2.32         {D6-D7}, [R10@64]! << crash
.text:00000072                 VSUB.F32        Q10, Q4, Q6
.text:00000076                 VADD.F32        Q8, Q4, Q6
.text:0000007A                 VADD.F32        D18, D10, D15
.text:0000007E                 VSUB.F32        D19, D11, D14
.text:00000082                 VSUB.F32        D22, D10, D15
.text:00000086                 VADD.F32        D23, D11, D14
.text:0000008A                 VTRN.32         Q8, Q9
.text:0000008E                 VTRN.32         Q10, Q11
.text:00000092                 VST2.32         {D16-D17}, [R11]!
.text:00000096                 VST2.32         {D20-D21}, [R11]!
.text:0000009A                 VST2.32         {D18-D19}, [R11]!
.text:0000009E                 VST2.32         {D22-D23}, [R11]!
.text:000000A2                 SUBS            R2, #2

It`s inside from NE10_fft_float32.neon.s:

         BUTTERFLY4X2_WITHOUT_TWIDDLES "FALSE"

So loads without adding fstride not crashed, but load with this additions crashed:

        add             p_fin2, p_fin0, fstride, lsl #4   /* get the address of F[fstride*2] */
        add             p_fin1, p_fin0, fstride, lsl #3   /* get the address of F[fstride] */
        add             p_fin3, p_fin2, fstride, lsl #3   /* get the address of F[fstride*3] */

So, it still looks like memory corruption. Also strange that dalvik-main appears in middle of stack trace. It's possible that corruption done by signal handler or some soft context switch code using setjmp/longjmp.

KevinKingZhan commented 7 years ago

Thank you lieff. is The problem related for My multithreaded programming? Or related for I use Shared_ptr?

Thank you Very much! @lieff

lieff commented 7 years ago

It's possible multi-threading bug, but there no need for multi-threading to corrupt stack or data using signal handler or longjmp. Check that input floats is valid float values. You can also try push fake static input data in fft_forward/fft_inverse (different input data is last thing it can be ne10 bug).

joesavage commented 7 years ago

It seems like there are a lot of moving parts here, so it's difficult to tell exactly what's at fault. It's probably worth trying to reduce the problem down to a smaller test case. For instance, have you tried running your application with only a single thread? And does the FFT routine produce the correct results for the crashing inputs in a simpler test program?

Failing this, it seems like your best bet is to dive into deeper runtime analysis. Are you able to run the problematic binary through a debugger? Running gdb with handle SIGSEGV stop, and checking the crashing assembly instructions (e.g. with layout asm) seems like a start. It looks like there might be pieces of this calculation above, but it's always worth having a machine verify these sorts of calculations where possible, as they can get rather messy (depending on assemblers, linkers, etc.) From there, you can examine exactly what the issue appears to be, and with any luck track the problem back to its source (e.g. via breakpoints and watchpoints). If this does end up being an issue with Ne10, please do let us know :)

KevinKingZhan commented 7 years ago

Thank you, @joesavage @lieff . I will build a smaller test case for This Problem!

zhangkg5 commented 7 years ago

Hi, @joesavage @lieff . I worked with @KevinKingZhan and I found that, if I complied libNe10.a with the setting " set(NE10_ASM_OPTIMIZATION off)" on Android platform in CMakeLists.txt , my project never crashed. but if the setting was changed to be " set(NE10_ASM_OPTIMIZATION on)", it crashed and the Crash dump said " ./modules/dsp/NE10_fft_float32.neon.s:498". And I found a setting " set(CMAKE_ASM_COMPILER_WORKS TRUE)" in ./android/android_config.cmake file. So I want to know how could the settings make a difference in the result ,and what`s the function of these two settings? Please help me , thanks you !

lieff commented 6 years ago

@zhangkg5 With NE10_ASM_OPTIMIZATION=off, intrinsic neon versions enabled (NE10_fft_float32.neonintrinsic.c). You loose not much if you use it (sometimes compilers compiles intrinsic code even better than handwritten asm). I cannot find error in asm version, may be it's some hard-to-find bug, may be it's ABI difference with rest C code, may be it's hard-to-find hardware bug (I've seen this kind of bugs too, usually on 8-core android where 4cores is one architecture and 4cores other low-power architecture).

niyeyangqian commented 2 years ago

hi, I have same problem with you, have you slove it? @KevinKingZhan

lieff commented 2 years ago

@niyeyangqian do you have stack trace of the crash?

niyeyangqian commented 2 years ago

backtrace as below:

12-14 11:03:28.387 F/libc    ( 5444): Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0xbb743000 in tid 5680 (Thread-2), pid 5444 (ttsdemo)
12-14 11:03:28.446 F/DEBUG   ( 5782): *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
12-14 11:03:28.447 F/DEBUG   ( 5782): Build fingerprint: 'Xiaomi/cas/cas:11/RKQ1.200826.002/V12.5.8.0.RJJCNXM:user/release-keys'
12-14 11:03:28.447 F/DEBUG   ( 5782): Revision: '0'
12-14 11:03:28.447 F/DEBUG   ( 5782): ABI: 'arm'
12-14 11:03:28.447 F/DEBUG   ( 5782): Timestamp: 2021-12-14 11:03:28+0800
12-14 11:03:28.447 F/DEBUG   ( 5782): pid: 5444, tid: 5680, name: Thread-2  >>> com.ttsdemo <<<
12-14 11:03:28.447 F/DEBUG   ( 5782): uid: 10353
12-14 11:03:28.447 F/DEBUG   ( 5782): signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0xbb743000
12-14 11:03:28.447 F/DEBUG   ( 5782):     r0  b839e000  r1  bb741000  r2  00000002  r3  b8d2da20
12-14 11:03:28.447 F/DEBUG   ( 5782):     r4  00000005  r5  00000100  r6  00000004  r7  bb741810
12-14 11:03:28.447 F/DEBUG   ( 5782):     r8  bb742010  r9  bb742810  r10 bb743000  r11 b839ffc0
12-14 11:03:28.447 F/DEBUG   ( 5782):     ip  00000004  sp  bb6f4120  lr  b839e000  pc  bb25279e
12-14 11:03:28.448 F/DEBUG   ( 5782): backtrace:
12-14 11:03:28.448 F/DEBUG   ( 5782):       #00 pc 0070d79e  /data/app/~~BLewhi4levzGjxg0d8RYIg==/com.ttsdemo-MWwGIhdCOLFQn3SuyKF-4g==/lib/arm/libptts.so (BuildId: 03a80ac9d8cbc705322c3a50353c9dcb8a7e8972)

and I convert pc 0070d79e with NDK bulitin addr2line tool,it shows:

ne10_mixed_radix_fft_forward_float32_neon
/mnt/.../third_party/ne10/dsp/NE10_fft_float32.neon.s:498

which refers to:

.L_ne10_radix4_butterfly_first_stage_fstride:
        BUTTERFLY4X2_WITHOUT_TWIDDLES "FALSE" // line 498

        subs            count, count, #2
        bgt             .L_ne10_radix4_butterfly_first_stage_fstride

        /* swap input/output buffer  */
        ldr             tmp0, [sp, #104]
        mov             p_fin, p_fout
        mov             p_fout, tmp0

        /* (stage_count-2): reduce the counter for the last stage  */
        sub             stage_count, stage_count, #2
        lsl             nstep, fstride, #3
        lsr             fstride, fstride, #2

        b               .L_ne10_butterfly_other_stages
        /* ---------------end of first stage: radix 4  */

I do not know what does line 498 mean.

lieff commented 2 years ago

Original stack trace contains some wired stuff:

  1. Stack frame 0 contains __eabi_dmul calls NE10_fft_generic_float32.neonintrinsic.cpp which makes no sense.
  2. Frames below contains sigwait.

Which makes me think bug can be related to signals handling. So we need full new call stack with debug info to check this theory. Also build with ASAN can help debug memory related issues.

niyeyangqian commented 2 years ago

sorry, how you see stack frame, frame 0 contains __eabi_dmul call NE10_fft_generic_float32.neonintrinsic.cpp and sigwait?

lieff commented 2 years ago

It's from original issue stack trace: image