Open KevinKingZhan opened 7 years ago
Stack seems corrupted. Or fft routine called from signal handler?
I use Compile Flag :-mfloat-abi=hard -mfpu=vfp3 in My Build shared Library. is this flag impact some neonintrinsic Function(ex:vmlaq_f32, vst2q_f32 etc.)?
mfloat-abi impact float parameters passing between functions, mfpu impact code generation. I usually use -mfpu=neon -ffast-math -fvectorize for compiler neon code generation. Also I do not see any *_dmul runtime functions calling from ne10, NE10_fft_generic_float32.neonintrinsic.cpp.o only imports
U __aeabi_idiv
U __aeabi_unwind_cpp_pr0
in my case. Here how I build ne10:
mkdir build && cd build
export ANDROID_NDK=~/Android/Sdk/ndk-bundle
export NE10_ANDROID_TARGET_ARCH=armv7
cmake -DCMAKE_TOOLCHAIN_FILE=../android/android_config.cmake ..
make
Thank you lieff, I Use Compile Flag : -mfloat-abi=softfp -mfpu=neon And The Error was happened by Accident.
The Method Stack Trace is :
0 libsongstudio.so!ne10_mixed_radix_fft_forward_float32_neon + 0x5d
r0 = 0x8f826118 r1 = 0x94438000 r2 = 0x00000002 r3 = 0x8f81a118
r4 = 0x00000006 r5 = 0x00000400 r6 = 0x00000004 r7 = 0x9443a010
r8 = 0x9443c010 r9 = 0x9443e010 r10 = 0x94440000 r12 = 0x00000004
fp = 0x8f82e0d8 sp = 0x9483bad8 lr = 0x8f826118 pc = 0x9c78549e
Found by: given as instruction pointer in context
1 libsongstudio.so!ne10_fft_c2c_1d_float32_neon [NE10_fft_float32.neon.c : 748 + 0xf]
sp = 0x9483bb40 pc = 0x9c766913
Found by: stack scanning
2 libsongstudio.so!ne10_fft_r2c_1d_float32_neon [NE10_fft_float32.neon.c : 788 + 0xb]
r4 = 0x8f826118 r5 = 0x94428000 r6 = 0x8f81a000 r7 = 0x8f81a018
r8 = 0x94438000 sp = 0x9483bbf0 pc = 0x9c766995
Found by: call frame info
3 libsongstudio.so!NeonFFTAccelerate::fft_forward [neon_fft_accelerate.cpp : 34 + 0xd]
r4 = 0x9845eb20 r5 = 0x9cb1bb2c r6 = 0x00002000 r7 = 0x8784a018
r8 = 0x87852018 r9 = 0x87866000 r10 = 0x9483dcc8 sp = 0x9483bc70
pc = 0x9c750687
Found by: call frame info
What should I do ? Please Help me, Thank you very much.
I have following assumptions: 1. input\output data not float aligned (unlikely, I do not see any register that looks like address and not 8 bytes aligned) 2. input\output buffers too small, note that ne10 uses complex fft length 3. your CPU do not support neon, so it crashed on neon instruction. Also you can build part of code on linux\windows and check it using valgrind or drmemory.
Yeah, lieff. The same code is running on the same device, and most of the time it is normal, and occasionally the Error Occured. So The device is support neon. And nfft Size is Larger than 3. My Code is :
NeonFFTAccelerate::NeonFFTAccelerate(size_t nfft):FftRoutine(nfft) {
if (ne10_init() != NE10_OK) {
LOGE("Failed to initialise Ne10.");
}
m_nfft = nfft;
cfg = ne10_fft_alloc_r2c_float32(nfft);
in = (ne10_float32_t*) NE10_MALLOC (nfft * sizeof (ne10_float32_t));
out = (ne10_fft_cpx_float32_t*) NE10_MALLOC (nfft * sizeof (ne10_fft_cpx_float32_t));
}
NeonFFTAccelerate::~NeonFFTAccelerate() {
NE10_FREE(in);
NE10_FREE(out);
NE10_FREE(cfg);
}
void NeonFFTAccelerate::fft_forward(float* input, float* output_re, float* output_im) {
memcpy(in, input, sizeof(float) * m_nfft);
ne10_fft_r2c_1d_float32(out, in, cfg);
for (int i = 0; i < m_nfft / 2; i++) {
output_re[i] = out[i].r;
output_im[i] = out[i].i;
}
}
The fft size nfft is 2048 or 8192. is The attribute cfg must create in fft_forward method? is problem exist in My Code? Thank you very much! @lieff
Hmm, I do not see any problem, cfg can be created in constructor. May be memory corruption in some other place. Crash at beginning ne10_mixed_radix_fft_forward_float32_neon, it can happen if it reads invalid values from cfg factors[] buffer. This buffer used in previous function, so use-after-free and parallel destructor call is less probability.
OK, I will try alloc cfg, IN, OUT In every fft_forward call. Thank you lieff.
I will leave message When I Finished All Test-Case. Thank you! @lieff
Hi, lieff. Because of We think cfg may be updated by other thread. So I modifyed Code :
NeonFFTAccelerate::NeonFFTAccelerate(size_t nfft):FftRoutine(nfft) {
if (ne10_init() != NE10_OK) {
LOGE("Failed to initialise Ne10.");
}
}
NeonFFTAccelerate::~NeonFFTAccelerate() {
}
void NeonFFTAccelerate::fft_forward(float* input, float* output_re, float* output_im) {
ne10_fft_r2c_cfg_float32_t cfg = ne10_fft_alloc_r2c_float32(m_nfft);
ne10_float32_t* in = (ne10_float32_t*) NE10_MALLOC (m_nfft * sizeof (ne10_float32_t));
ne10_fft_cpx_float32_t* out = (ne10_fft_cpx_float32_t*) NE10_MALLOC (m_nfft * sizeof (ne10_fft_cpx_float32_t));
memcpy(in, input, sizeof(float) * m_nfft);
ne10_fft_r2c_1d_float32(out, in, cfg);
for (int i = 0; i < m_nfft / 2; i++) {
output_re[i] = out[i].r;
output_im[i] = out[i].i;
}
NE10_FREE(in);
NE10_FREE(out);
NE10_FREE(cfg);
}
void NeonFFTAccelerate::fft_inverse(float* input_re, float* input_im, float* output) {
ne10_fft_r2c_cfg_float32_t cfg = ne10_fft_alloc_r2c_float32(m_nfft);
ne10_float32_t* in = (ne10_float32_t*) NE10_MALLOC (m_nfft * sizeof (ne10_float32_t));
ne10_fft_cpx_float32_t* out = (ne10_fft_cpx_float32_t*) NE10_MALLOC (m_nfft * sizeof (ne10_fft_cpx_float32_t));
for (int i = 0; i < m_nfft / 2; i++) {
out[i].r = input_re[i];
out[i].i = input_im[i];
}
ne10_fft_c2r_1d_float32(in, out, cfg);
memcpy(output, in, sizeof(float) * m_nfft);
NE10_FREE(in);
NE10_FREE(out);
NE10_FREE(cfg);
}
And The Crash has happened by Accident. The log is :
Operating system: Android
0.0.0 Linux 3.18.31-perf-g0bf156d-00671-gcef62fe #1 SMP PREEMPT Thu Apr 6 20:31:34 CST 2017 armv8l
CPU: arm
ARMv1 Qualcomm part(0x51002150) features: half,thumb,fastmult,vfpv2,edsp,neon,vfpv3,tls,vfpv4,idiva,idivt
4 CPUs
Crash reason: SIGSEGV
Crash address: 0xb8680000
Process uptime: not available
Thread 0 (crashed)
0 libsongstudio.so!ne10_mixed_radix_fft_forward_float32_neon + 0x5d
r0 = 0xb81d1798 r1 = 0xb867e000 r2 = 0x00000002 r3 = 0xb81ce798
r4 = 0x00000005 r5 = 0x00000100 r6 = 0x00000004 r7 = 0xb867e810
r8 = 0xb867f010 r9 = 0xb867f810 r10 = 0xb8680000 r12 = 0x00000004
fp = 0xb81d3758 sp = 0xffaa9c50 lr = 0xb81d1798 pc = 0xc86d544e
Found by: given as instruction pointer in context
1 dalvik-main space 1 (deleted) + 0xd1ffffe
sp = 0xffaa9c58 pc = 0x3fe00000
Found by: stack scanning
2 dalvik-main space 1 (deleted) + 0x147b7ffe
sp = 0xffaa9c64 pc = 0x473b8000
Found by: stack scanning
3 dalvik-main space 1 (deleted) + 0x123fdffe
sp = 0xffaa9c68 pc = 0x44ffe000
Found by: stack scanning
4 dalvik-main space 1 (deleted) + 0xe890fd9
sp = 0xffaa9c6c pc = 0x41490fdb
Found by: stack scanning
5 libsongstudio.so!ne10_fft_c2c_1d_float32_neon [NE10_fft_float32.neon.c : 748 + 0x9]
sp = 0xffaa9cb8 pc = 0xc86b68fb
Found by: stack scanning
6 libsongstudio.so!ne10_fft_r2c_1d_float32_neon [NE10_fft_float32.neon.c : 786 + 0x3]
r4 = 0xb81d1798 r5 = 0xb849f840 r6 = 0xb81ce798 r7 = 0xb81ce698
r8 = 0xb81ce680 sp = 0xffaa9d68 pc = 0xc86b6953
Found by: call frame info
7 libsongstudio.so!NeonFFTAccelerate::fft_forward [neon_fft_accelerate.cpp : 35 + 0xd]
r4 = 0xc86b6921 r5 = 0x00000800 r6 = 0xb80f5000 r7 = 0xb95629a0
r8 = 0xb849f840 r9 = 0xb867e000 r10 = 0xb81ce680 sp = 0xffaa9de8
pc = 0xc86a0655
Found by: call frame info
And My Test Device is Xiaomi5 with Andorid 7.0 System.
BTW: For Get the correct stack trace, So I compiled Ne10 source to My project. I have Found The NE10_fft_float32.neon.c File Called NE10_fft_float32.neon.s. So I have not Compiled NE10_fft_float32.neonintrinsic.c file. Is My Compiled Process is right? Thank you very much. @lieff
Yes .s file used in my case. Here ne10_mixed_radix_fft_forward_float32_neon + 0x5d crash place:
.text:00000010 EXPORT ne10_mixed_radix_fft_forward_float32_neon
.text:00000010 ne10_mixed_radix_fft_forward_float32_neon
.text:00000010
.text:00000010 arg_0 = 0
.text:00000010
.text:00000010 PUSH.W {R4-R12,LR}
.text:00000014 VPUSH {D8-D15}
.text:00000018 LDR R4, [R2]
.text:0000001A LDR R5, [R2,#4]
.text:0000001C ADD.W R2, R2, R4,LSL#3
.text:00000020 LDR.W R12, [R2]
.text:00000024 LDR.W R6, [R2,#-4]
.text:00000028 MOV LR, R0
.text:0000002A CMP.W R12, #8
.text:0000002E BEQ loc_BC
.text:00000030 MOV R2, R5
.text:00000032 MOV R7, R1
.text:00000034 MOV R11, R0
.text:00000036 ADD.W R9, R7, R5,LSL#4
.text:0000003A ADD.W R8, R7, R5,LSL#3
.text:0000003E ADD.W R10, R9, R5,LSL#3
.text:00000042 VLD2.32 {D0-D1}, [R7@64]!
.text:00000046 VLD2.32 {D4-D5}, [R9@64]!
.text:0000004A VLD2.32 {D2-D3}, [R8@64]!
.text:0000004E VLD2.32 {D6-D7}, [R10@64]!
.text:00000052
.text:00000052 loc_52 ; CODE XREF: ne10_mixed_radix_fft_forward_float32_neon+94j
.text:00000052 VADD.F32 Q4, Q0, Q2
.text:00000056 VSUB.F32 Q5, Q0, Q2
.text:0000005A VLD2.32 {D0-D1}, [R7@64]!
.text:0000005E VLD2.32 {D4-D5}, [R9@64]!
.text:00000062 VADD.F32 Q6, Q1, Q3
.text:00000066 VSUB.F32 Q7, Q1, Q3
.text:0000006A VLD2.32 {D2-D3}, [R8@64]! <<
.text:0000006E VLD2.32 {D6-D7}, [R10@64]! << crash
.text:00000072 VSUB.F32 Q10, Q4, Q6
.text:00000076 VADD.F32 Q8, Q4, Q6
.text:0000007A VADD.F32 D18, D10, D15
.text:0000007E VSUB.F32 D19, D11, D14
.text:00000082 VSUB.F32 D22, D10, D15
.text:00000086 VADD.F32 D23, D11, D14
.text:0000008A VTRN.32 Q8, Q9
.text:0000008E VTRN.32 Q10, Q11
.text:00000092 VST2.32 {D16-D17}, [R11]!
.text:00000096 VST2.32 {D20-D21}, [R11]!
.text:0000009A VST2.32 {D18-D19}, [R11]!
.text:0000009E VST2.32 {D22-D23}, [R11]!
.text:000000A2 SUBS R2, #2
It`s inside from NE10_fft_float32.neon.s:
BUTTERFLY4X2_WITHOUT_TWIDDLES "FALSE"
So loads without adding fstride not crashed, but load with this additions crashed:
add p_fin2, p_fin0, fstride, lsl #4 /* get the address of F[fstride*2] */
add p_fin1, p_fin0, fstride, lsl #3 /* get the address of F[fstride] */
add p_fin3, p_fin2, fstride, lsl #3 /* get the address of F[fstride*3] */
So, it still looks like memory corruption. Also strange that dalvik-main appears in middle of stack trace. It's possible that corruption done by signal handler or some soft context switch code using setjmp/longjmp.
Thank you lieff. is The problem related for My multithreaded programming? Or related for I use Shared_ptr?
Thank you Very much! @lieff
It's possible multi-threading bug, but there no need for multi-threading to corrupt stack or data using signal handler or longjmp. Check that input floats is valid float values. You can also try push fake static input data in fft_forward/fft_inverse (different input data is last thing it can be ne10 bug).
It seems like there are a lot of moving parts here, so it's difficult to tell exactly what's at fault. It's probably worth trying to reduce the problem down to a smaller test case. For instance, have you tried running your application with only a single thread? And does the FFT routine produce the correct results for the crashing inputs in a simpler test program?
Failing this, it seems like your best bet is to dive into deeper runtime analysis. Are you able to run the problematic binary through a debugger? Running gdb
with handle SIGSEGV stop
, and checking the crashing assembly instructions (e.g. with layout asm
) seems like a start. It looks like there might be pieces of this calculation above, but it's always worth having a machine verify these sorts of calculations where possible, as they can get rather messy (depending on assemblers, linkers, etc.) From there, you can examine exactly what the issue appears to be, and with any luck track the problem back to its source (e.g. via breakpoints and watchpoints). If this does end up being an issue with Ne10, please do let us know :)
Thank you, @joesavage @lieff . I will build a smaller test case for This Problem!
Hi, @joesavage @lieff . I worked with @KevinKingZhan and I found that, if I complied libNe10.a with the setting " set(NE10_ASM_OPTIMIZATION off)" on Android platform in CMakeLists.txt , my project never crashed. but if the setting was changed to be " set(NE10_ASM_OPTIMIZATION on)", it crashed and the Crash dump said " ./modules/dsp/NE10_fft_float32.neon.s:498". And I found a setting " set(CMAKE_ASM_COMPILER_WORKS TRUE)" in ./android/android_config.cmake file. So I want to know how could the settings make a difference in the result ,and what`s the function of these two settings? Please help me , thanks you !
@zhangkg5 With NE10_ASM_OPTIMIZATION=off, intrinsic neon versions enabled (NE10_fft_float32.neonintrinsic.c). You loose not much if you use it (sometimes compilers compiles intrinsic code even better than handwritten asm). I cannot find error in asm version, may be it's some hard-to-find bug, may be it's ABI difference with rest C code, may be it's hard-to-find hardware bug (I've seen this kind of bugs too, usually on 8-core android where 4cores is one architecture and 4cores other low-power architecture).
hi, I have same problem with you, have you slove it? @KevinKingZhan
@niyeyangqian do you have stack trace of the crash?
backtrace as below:
12-14 11:03:28.387 F/libc ( 5444): Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0xbb743000 in tid 5680 (Thread-2), pid 5444 (ttsdemo)
12-14 11:03:28.446 F/DEBUG ( 5782): *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
12-14 11:03:28.447 F/DEBUG ( 5782): Build fingerprint: 'Xiaomi/cas/cas:11/RKQ1.200826.002/V12.5.8.0.RJJCNXM:user/release-keys'
12-14 11:03:28.447 F/DEBUG ( 5782): Revision: '0'
12-14 11:03:28.447 F/DEBUG ( 5782): ABI: 'arm'
12-14 11:03:28.447 F/DEBUG ( 5782): Timestamp: 2021-12-14 11:03:28+0800
12-14 11:03:28.447 F/DEBUG ( 5782): pid: 5444, tid: 5680, name: Thread-2 >>> com.ttsdemo <<<
12-14 11:03:28.447 F/DEBUG ( 5782): uid: 10353
12-14 11:03:28.447 F/DEBUG ( 5782): signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0xbb743000
12-14 11:03:28.447 F/DEBUG ( 5782): r0 b839e000 r1 bb741000 r2 00000002 r3 b8d2da20
12-14 11:03:28.447 F/DEBUG ( 5782): r4 00000005 r5 00000100 r6 00000004 r7 bb741810
12-14 11:03:28.447 F/DEBUG ( 5782): r8 bb742010 r9 bb742810 r10 bb743000 r11 b839ffc0
12-14 11:03:28.447 F/DEBUG ( 5782): ip 00000004 sp bb6f4120 lr b839e000 pc bb25279e
12-14 11:03:28.448 F/DEBUG ( 5782): backtrace:
12-14 11:03:28.448 F/DEBUG ( 5782): #00 pc 0070d79e /data/app/~~BLewhi4levzGjxg0d8RYIg==/com.ttsdemo-MWwGIhdCOLFQn3SuyKF-4g==/lib/arm/libptts.so (BuildId: 03a80ac9d8cbc705322c3a50353c9dcb8a7e8972)
and I convert pc 0070d79e with NDK bulitin addr2line tool,it shows:
ne10_mixed_radix_fft_forward_float32_neon
/mnt/.../third_party/ne10/dsp/NE10_fft_float32.neon.s:498
which refers to:
.L_ne10_radix4_butterfly_first_stage_fstride:
BUTTERFLY4X2_WITHOUT_TWIDDLES "FALSE" // line 498
subs count, count, #2
bgt .L_ne10_radix4_butterfly_first_stage_fstride
/* swap input/output buffer */
ldr tmp0, [sp, #104]
mov p_fin, p_fout
mov p_fout, tmp0
/* (stage_count-2): reduce the counter for the last stage */
sub stage_count, stage_count, #2
lsl nstep, fstride, #3
lsr fstride, fstride, #2
b .L_ne10_butterfly_other_stages
/* ---------------end of first stage: radix 4 */
I do not know what does line 498 mean.
Original stack trace contains some wired stuff:
Which makes me think bug can be related to signals handling. So we need full new call stack with debug info to check this theory. Also build with ASAN can help debug memory related issues.
sorry, how you see stack frame, frame 0 contains __eabi_dmul call NE10_fft_generic_float32.neonintrinsic.cpp and sigwait?
It's from original issue stack trace:
Hi, Joe Savage. I intergrated Ne10 into My Android Project, there was an Error Occured. The Log is This Crash is not happened every time. it is happened by Accident. I don't have any ideas. Please help me, Thank you!