igorski / MWEngine

Audio engine and DSP library for Android, written in C++ providing low latency performance within a musical context, while providing a Java/Kotlin API. Supports both OpenSL and AAudio.
MIT License
261 stars 45 forks source link

AudioEngine.pause() causes native crash #55

Closed harthorst closed 5 years ago

harthorst commented 5 years ago

This only happens on Cat S60. I call _engine.pause() when app goes into background. Model: Cat S60 Android Version: 6.0.1

This also happens with the MWEngine Example App

** Crash dump: ** Build fingerprint: 'Cat/CatS60/CatS60:6.0.1/MMB29M/LTE_S0201121.0_S60_0.030.01:user/release-keys' pid: 15359, tid: 15432, name: Thread-17525 >>> com.harthorst.compas <<< signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0xdeadbaad Stack frame #00 pc 0000000000047548 /system/lib64/libc.so (dlfree+408) Stack frame #01 pc 0000000000019568 /system/lib64/libc.so (free+20) Stack frame #02 pc 0000000000041238 /data/app/com.harthorst.compas-2/lib/arm64/libmwengine.so (_ZN13DriverAdapter7destroyEv+36) Stack frame #03 pc 0000000000042280 /data/app/com.harthorst.compas-2/lib/arm64/libmwengine.so (_ZN11AudioEngine5startEv+496) Stack frame #04 pc 0000000000129ea8 /system/lib64/libart.so (art_quick_generic_jni_trampoline+152) Stack frame #05 pc 0000000000120518 /system/lib64/libart.so (art_quick_invoke_static_stub+600) Stack frame #06 pc 00000000001301c4 /system/lib64/libart.so (_ZN3art9ArtMethod6InvokeEPNS_6ThreadEPjjPNS_6JValueEPKc+344) Stack frame #07 pc 00000000004d23f0 /system/lib64/libart.so (artInterpreterToCompiledCodeBridge+212) Stack frame #08 pc 00000000002abb00 /system/lib64/libart.so (_ZN3art11interpreter6DoCallILb0ELb0EEEbPNS_9ArtMethodEPNS_6ThreadERNS_11ShadowFrameEPKNS_11InstructionEtPNS_6JValueE+480) Stack frame #09 pc 00000000000dd548 /system/lib64/libart.so (_ZN3art11interpreter15ExecuteGotoImplILb0ELb0EEENS_6JValueEPNS_6ThreadEPKNS_7DexFile8CodeItemERNS11ShadowFrameES2+22200) Stack frame #10 pc 0000000000289908 /system/lib64/libart.so (artInterpreterToInterpreterBridge+220) Stack frame #11 pc 00000000002abb00 /system/lib64/libart.so (_ZN3art11interpreter6DoCallILb0ELb0EEEbPNS_9ArtMethodEPNS_6ThreadERNS_11ShadowFrameEPKNS_11InstructionEtPNS_6JValueE+480) Stack frame #12 pc 00000000000dd548 /system/lib64/libart.so (_ZN3art11interpreter15ExecuteGotoImplILb0ELb0EEENS_6JValueEPNS_6ThreadEPKNS_7DexFile8CodeItemERNS11ShadowFrameES2+22200) Stack frame #13 pc 0000000000289510 /system/lib64/libart.so (_ZN3art11interpreter30EnterInterpreterFromEntryPointEPNS_6ThreadEPKNS_7DexFile8CodeItemEPNS_11ShadowFrameE+96) Stack frame #14 pc 000000000053ef00 /system/lib64/libart.so (artQuickToInterpreterBridge+632) Stack frame #15 pc 0000000000129fe4 /system/lib64/libart.so (art_quick_to_interpreter_bridge+100) Stack frame #16 pc 0000000000120264 /system/lib64/libart.so (art_quick_invoke_stub+580) Stack frame #17 pc 000000000013011c /system/lib64/libart.so (_ZN3art9ArtMethod6InvokeEPNS_6ThreadEPjjPNS_6JValueEPKc+176) Stack frame #18 pc 00000000004312e8 /system/lib64/libart.so (_ZN3art35InvokeVirtualOrInterfaceWithJValuesERKNS_33ScopedObjectAccessAlreadyRunnableEP8_jobjectP10_jmethodIDP6jvalue+460) Stack frame #19 pc 000000000045f994 /system/lib64/libart.so (_ZN3art6Thread14CreateCallbackEPv+744) Stack frame #20 pc 0000000000067784 /system/lib64/libc.so (_ZL15__pthread_startPv+52) Stack frame #21 pc 000000000001c604 /system/lib64/libc.so (__start_thread+16)

igorski commented 5 years ago

That is interesting. It might hint at a race condition what with that device having a whopping 8 cores. Will investigate further.

igorski commented 5 years ago

@harthorst this is something I struggle to reproduce. Does this occur on the example Activity as well ? Has this occurred after the fix for the orientation change (which dealt with a similar disposal routine) ?

harthorst commented 5 years ago

Hi! Yes, the issue also occured on the example Activity. But only on Cat60 :) . I'll fetch your changes and retest it again.

harthorst commented 5 years ago

Fetched a fresh copy, compiled and deployed it. Same issue. For me the CAT60 is not crucial - it's a good choice if you want to put a nail into the wall.

teotigraphix commented 5 years ago

I am getting crashes in pause as well, Nexus 7 2nd gen.

Plus the emulator.

igorski commented 5 years ago

I've had success with the changes proposed in https://github.com/igorski/MWEngine/pull/86

I don't have the exact hardware available as mentioned in this issue, can you guys verify?

teotigraphix commented 5 years ago

It was crashing on the emulator as well. I will compile the branch and test the sample app.

igorski commented 5 years ago

What is the exact audio driver that is used here ? OpenSL or AAudio ? @teotigraphix @harthorst

teotigraphix commented 5 years ago

I am using OpenSL right now, but I have a Pixel and Samsung tablet that can run AAudio when we get to that point.

harthorst commented 5 years ago

I'm using OpenSL. Tried out changes from branch "engine-crash-unfocus" with the Mi Pad4 - same result, no audio but app is alive.

I will make a test tomorrow on the CAT60 as well.

teotigraphix commented 5 years ago

On the Nexus 7, no crashes from the #engine-crash-focus branch.

So I thought I would checkout master again, recompiled and installed.

No crash, so I guess I am not much help on this. (I also tried the emulator again and couldn't get it to crash.)

I will check my test app as well.

harthorst commented 5 years ago

Hi. Just tested the build with my Cat60. The example app still crashes when sent to background.

igorski commented 5 years ago

I have updated the branch in the pull request (https://github.com/igorski/MWEngine/pull/86). With the logs provided by @harthorst I could deduce that an error could occur when closing an audio device that hadn't opened yet (due to race condition between engine initialization in the Java and native layers when switching between orientations / expanding/collapsing notifications bar).

I'm sorry to ask, but could you check again as I'm working on assumption solely. On physical devices as well as emulators I cannot replicate this issue (bar a phone running Android 6.0.1 that crashes for a different reason - Android Canvas bug in the OS when repeatedly rotating). If the issue is happening on an emulator, what is its configuration ?

teotigraphix commented 5 years ago

Hah, I updated the Emulator 2 days ago and it whacked all my adv settings so I had to start over.

Thinking about this now, it may have actually been the last version of the emulator or bug therein. I cannot reproduce on the emulator now, I was using the Nexus 5 (Oreo) default and a Nexus 7 (Marshmallow).

Your deduction about the race condition might actually be the fix of my problems I was having and that it was random.

@harthorst Issue of no audio but running app, I have not encountered.

harthorst commented 5 years ago

In both cases (Mi Pad4 and Cat60) the behavior is not random. I have the same result every time.

@igorski I'll recheck this tomorrow.

harthorst commented 5 years ago

Good morning. Checked out latest changes. App is still crashing on CAT60 when sent to background. Please don't hesitate to ask for further tests, I know it's a hard job to find that issue without having the hardware. I also can offer you a debug session via skype (or whatever you prefer).

** Crash dump: ** Build fingerprint: 'Cat/CatS60/CatS60:6.0.1/MMB29M/LTE_S0201121.0_S60_0.030.01:user/release-keys' pid: 17755, tid: 17768, name: Thread-1459 >>> nl.igorski.example <<< signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0xdeadbaad Stack frame #00 pc 0000000000047548 /system/lib64/libc.so (dlfree+408) Stack frame #01 pc 0000000000019568 /system/lib64/libc.so (free+20) Stack frame #02 pc 0000000000069728 /data/app/nl.igorski.example-1/lib/arm64/libmwengine.so (_ZN13DriverAdapter7destroyEv+36): Routine ADSR::setDurations(int, int, int, int) at /Users/thf/tmp/MWEngine_vanilla/MWEngine/src/main/cpp/./modules/adsr.cpp:230 Stack frame #03 pc 000000000006a760 /data/app/nl.igorski.example-1/lib/arm64/libmwengine.so (_ZN11AudioEngine5startEv+480): Routine std::ndk1::ios_base::setstate(unsigned int) at /Volumes/Android/buildbot/src/android/ndk-release-r16/external/libcxx/include/ios:529 Stack frame #04 pc 000000000006cfa8 /data/app/nl.igorski.example-1/oat/arm64/base.odex (offset 0x26000) (void nl.igorski.lib.audio.mwengine.MWEngineCoreJNI.start()+124) Stack frame #05 pc 0000000000084c14 /data/app/nl.igorski.example-1/oat/arm64/base.odex (offset 0x26000) (void nl.igorski.lib.audio.mwengine.MWEngineCore.start()+72) Stack frame #06 pc 000000000002c0b8 /data/app/nl.igorski.example-1/oat/arm64/base.odex (offset 0x26000) (void nl.igorski.lib.audio.MWEngine.run()+908) Stack frame #07 pc 0000000000120264 /system/lib64/libart.so (art_quick_invoke_stub+580) Stack frame #08 pc 000000000013011c /system/lib64/libart.so (_ZN3art9ArtMethod6InvokeEPNS_6ThreadEPjjPNS_6JValueEPKc+176) Stack frame #09 pc 00000000004312e8 /system/lib64/libart.so (_ZN3art35InvokeVirtualOrInterfaceWithJValuesERKNS_33ScopedObjectAccessAlreadyRunnableEP8_jobjectP10_jmethodIDP6jvalue+460) Stack frame #10 pc 000000000045f994 /system/lib64/libart.so (_ZN3art6Thread14CreateCallbackEPv+744) Stack frame #11 pc 0000000000067784 /system/lib64/libc.so (_ZL15__pthread_startPv+52) Stack frame #12 pc 000000000001c604 /system/lib64/libc.so (start_thread+16)

igorski commented 5 years ago

@harthorst I suddenly realise something looking at your latest stacktrace, could you try updating your NDK version to 18 ? I had recently updated, while before the update I could experience these problems on occasion. I believe @teotigraphix has done the same recently.

harthorst commented 5 years ago

updated to r18b. Same here, same crash dump. BTW: I am building the app on a Mac. I can retest it on Linux this evening.

igorski commented 5 years ago

:( darn it, hoping it would be easy. Shouldn't be much different on macOS / Linux. Testing on macOS myself here.

teotigraphix commented 5 years ago

FWIW I have had segmentation faults with null pointers on the C++ level before and I wasted so much time thinking it was something else.

My gut tells me this is a null pointer.

harthorst commented 5 years ago

Same with linux. @igorski maybe you can send me an apk - just to be sure I'm not making a silly mistake.

igorski commented 5 years ago

There is a prebuilt APK available here: https://www.dropbox.com/s/vt3df06o82thpay/MWEngine-debug.apk?dl=0

harthorst commented 5 years ago

Thanks. The app still crashes on CAT60 :(

igorski commented 5 years ago

I'm the last guy to claim the problem lies with a single device, but this is mind boggling as all efforts to reproduce this (which has been done on a subset of devices and emulators, as pointed out) no longer lead to "success". Your stack traces are very helpful by the way, I just don't understand why it happens only on that device. The powerful CPU leads me to think there's a race condition happening where one or the other is required while it has been disposed or hasn't been created yet.

I am now considering using a notification type system where the engine would inform when it is ready to be disposed (when requested), before it requests from the Activity its actual destruction, but I'm not sure how that flies when said Activity's destruction sequence has initiated this to begin with... also the changes in this PR ensure that the audio drivers are disposed after performing read/writes to the songs "actors".

It's probably more worthwhile to not focus on the destruction sequence, but rather the initialization sequence. Like you mentioned (and above stacktrace shows), it fails on new creation. That would imply that (in this case) OpenSL cannot receive a grant to access the audio hardware. Not sure why that happens, but let's at least catch that in a way that doesn't crash the app.

harthorst commented 5 years ago

I didn't think much about it, but something is weird:

The CAT60 is not crashing when I bring it to foreground, it happens when I put it into background. But in the stacktrace I see that MWEngine.start() is called.

I'll investigate this next week. The CAT60 is at work now :(

The initialization sequence could be helpful for the Mi Pad 4 which does not crash but mutes when brought back to foreground. Maybe I can send it to you. I bought it for developing only.

igorski commented 5 years ago

Ah so it IS when sending to the background... The reason you see start() on the stack trace is that it triggers the rendering thread which is also where eventually the engine is paused and destroyed (e.g. crashes occurring on destruction will appear in the scope of the start() function).

Are there any other log messages being sent to the device (not from MWENGINE, but say any other system process) when the crash happens?

harthorst commented 5 years ago

Exactly! I simply start the app and send it into background. Then I see a message about app crash. I uploaded a logcat output to: https://wetransfer.com/downloads/ac69d1793ced8c80987b3785e63a220c20181119070129/2fe0af0605911afb659a1e5495b338e120181119070129/8e2796

igorski commented 5 years ago

That's interesting:

"invalid address or address of corrupt block 0x559cbd9330 passed to dlfree"

That would indicate that the OpenSLStream would be corrupt, but it can only be allocated, or null when it reaches the free instruction.

Could you open a terminal/prompt and navigate to the /src/main/-folder of the repo and run the following command (requires ndk-stack to be in the path):

adb logcat | ndk-stack -sym obj/local/arm64-v8a

This should (hopefully) relate the crash to a line in the code (when in doubt: consult this link for more information). Be sure to recompile the application on your local system (instead of using the .apk provided in a previous comment).

harthorst commented 5 years ago

Build and installed the apk. Under src/main I did not find the obj folder. I executed:

adb logcat | ndk-stack -sym src/main/cpp/src/main/cpp/obj/local/arm64-v8a/ which gave me the following output:

** Crash dump: ** Build fingerprint: 'Cat/CatS60/CatS60:6.0.1/MMB29M/LTE_S0201121.0_S60_0.030.01:user/release-keys' pid: 31656, tid: 31669, name: Thread-3563 >>> nl.igorski.example <<< signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0xdeadbaad Stack frame #00 pc 0000000000047548 /system/lib64/libc.so (dlfree+408) Stack frame #01 pc 0000000000019568 /system/lib64/libc.so (free+20) Stack frame #02 pc 000000000006dc90 /data/app/nl.igorski.example-2/lib/arm64/libmwengine.so (_ZN13DriverAdapter7destroyEv+36): Routine DriverAdapter::destroy() at /Users/thf/tmp/MWEngine_vanilla/MWEngine/src/main/cpp/./drivers/adapter.cpp:70 Stack frame #03 pc 000000000006ed48 /data/app/nl.igorski.example-2/lib/arm64/libmwengine.so (_ZN11AudioEngine5startEv+496): Routine AudioEngine::start() at /Users/thf/tmp/MWEngine_vanilla/MWEngine/src/main/cpp/./audioengine.cpp:176 Stack frame #04 pc 000000000006cfa8 /data/app/nl.igorski.example-2/oat/arm64/base.odex (offset 0x26000) (void nl.igorski.lib.audio.mwengine.MWEngineCoreJNI.start()+124) Stack frame #05 pc 0000000000084c14 /data/app/nl.igorski.example-2/oat/arm64/base.odex (offset 0x26000) (void nl.igorski.lib.audio.mwengine.MWEngineCore.start()+72) Stack frame #06 pc 000000000002c0b8 /data/app/nl.igorski.example-2/oat/arm64/base.odex (offset 0x26000) (void nl.igorski.lib.audio.MWEngine.run()+908) Stack frame #07 pc 0000000000120264 /system/lib64/libart.so (art_quick_invoke_stub+580) Stack frame #08 pc 000000000013011c /system/lib64/libart.so (_ZN3art9ArtMethod6InvokeEPNS_6ThreadEPjjPNS_6JValueEPKc+176) Stack frame #09 pc 00000000004312e8 /system/lib64/libart.so (_ZN3art35InvokeVirtualOrInterfaceWithJValuesERKNS_33ScopedObjectAccessAlreadyRunnableEP8_jobjectP10_jmethodIDP6jvalue+460) Stack frame #10 pc 000000000045f994 /system/lib64/libart.so (_ZN3art6Thread14CreateCallbackEPv+744) Stack frame #11 pc 0000000000067784 /system/lib64/libc.so (_ZL15__pthread_startPv+52) Stack frame #12 pc 000000000001c604 /system/lib64/libc.so (__start_thread+16)

igorski commented 5 years ago

Thanks so much for the effort, the last stack traces are very helpful. Yet also mind boggling as they confirm what I suspected was the culprit (as mentioned before, disposal of the OpenSL Stream). The last stack trace literally points out dat the call to free(openSLStreamPointer*) leads to a crash as corrupted/unvailable memory is addressed.

It cannot be the case that a null pointer is passed as free() should be able to handle the situation. The DriverAdapter that is requesting the closing of the OpenSL Stream (from the _openslio.c file) is performing the nullptr check, as well as the disposing method (closeAndroidAudioDevice) within _openslio.c. As such it is unlikely that we're dealing with a null pointer, but rather a memory address that is no longer valid, for whatever reason...

harthorst commented 5 years ago

No problem. I have to thank you. I'd have never been come so far with my project in such a short time without that superb library.

teotigraphix commented 5 years ago

Wow I feel your pain on this one @igorski.

igorski commented 5 years ago

Hi @harthorst after further investigation I found a case where a thread unblock could be called on a non locked thread. I could recreate it on Nexus 5 and 7 devices (albeit in a slightly different manner). Can you pull from master and verify whether this (finally) addresses the issue ?

harthorst commented 5 years ago

I still see crashes, but now it's not happening every time (only about 4 of 5). Here is the stack trace:

** Crash dump: ** Build fingerprint: 'Cat/CatS60/CatS60:6.0.1/MMB29M/LTE_S0201121.0_S60_0.030.01:user/release-keys' pid: 17943, tid: 17956, name: Thread-843 >>> nl.igorski.example <<< signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0xdeadbaad Stack frame #00 pc 0000000000047548 /system/lib64/libc.so (dlfree+408) Stack frame #01 pc 0000000000019568 /system/lib64/libc.so (free+20) Stack frame #02 pc 0000000000072708 /data/app/nl.igorski.example-1/lib/arm64/libmwengine.so (_ZN8MWEngine13DriverAdapter7destroyEv+40): Routine MWEngine::DriverAdapter::destroy() at /Users/thf/tmp/MWEngine_vanilla/MWEngine/src/main/cpp/./drivers/adapter.cpp:72 Stack frame #03 pc 0000000000073840 /data/app/nl.igorski.example-1/lib/arm64/libmwengine.so (_ZN8MWEngine11AudioEngine5startEv+500): Routine MWEngine::AudioEngine::start() at /Users/thf/tmp/MWEngine_vanilla/MWEngine/src/main/cpp/./audioengine.cpp:181 Stack frame #04 pc 000000000004c2a8 /data/app/nl.igorski.example-1/oat/arm64/base.odex (offset 0x28000) (void nl.igorski.lib.audio.mwengine.MWEngineCoreJNI.AudioEngine_start()+124) Stack frame #05 pc 0000000000038814 /data/app/nl.igorski.example-1/oat/arm64/base.odex (offset 0x28000) (void nl.igorski.lib.audio.mwengine.AudioEngine.start()+72) Stack frame #06 pc 000000000002edac /data/app/nl.igorski.example-1/oat/arm64/base.odex (offset 0x28000) (void nl.igorski.lib.audio.MWEngine.run()+960) Stack frame #07 pc 0000000000120264 /system/lib64/libart.so (art_quick_invoke_stub+580) Stack frame #08 pc 000000000013011c /system/lib64/libart.so (_ZN3art9ArtMethod6InvokeEPNS_6ThreadEPjjPNS_6JValueEPKc+176) Stack frame #09 pc 00000000004312e8 /system/lib64/libart.so (_ZN3art35InvokeVirtualOrInterfaceWithJValuesERKNS_33ScopedObjectAccessAlreadyRunnableEP8_jobjectP10_jmethodIDP6jvalue+460) Stack frame #10 pc 000000000045f994 /system/lib64/libart.so (_ZN3art6Thread14CreateCallbackEPv+744) Stack frame #11 pc 0000000000067784 /system/lib64/libc.so (_ZL15__pthread_startPv+52) Stack frame #12 pc 000000000001c604 /system/lib64/libc.so (__start_thread+16)

I'll also check this on my Mi Pad 4 later. I helped myself with a blacklist (Mi Pad and Cat S60). On those devices the app just kills itself when sent to background. The only difference is a startup time of a few seconds if I bring them into foreground again. So it's not a showstopper. BTW: Happy new year!!

igorski commented 5 years ago

The new year is off to a good start with an improved 20 % chance at non-failure! :D

I have just committed a new change to the master branch. There was a curious case where delete was invoked on a just closed OpenSL stream. The stream however is allocated using calloc() in a pure C implementation, and free()d accordingly when the stream is closed. The resulting attempted deletion of said freed structure is undefined behaviour.

harthorst commented 5 years ago

Yes!!! That did it. No more crashes.

igorski commented 5 years ago

NOW it's a happy new year!

Thanks for bearing with me during this 4 month issue while more-or-less flying blind in trying to recreate the exact issue/finding the root cause🥇(the irony being that the fix was removal of a single line). Definitely improved the project for a wider use base.

harthorst commented 5 years ago

Checked it on Mi Pad 4 - good news here too :)