google / oboe

Oboe is a C++ library that makes it easy to build high-performance audio apps on Android.
Apache License 2.0
3.71k stars 568 forks source link

Possible race on device disconnect, callback after close #953

Closed gavv closed 3 years ago

gavv commented 4 years ago

Android version(s): 9 Android device(s): Xiaomi Redmi Note 7 Oboe version: 1.4.2

Hi,

First of all, sorry if I'm reporting this to a wrong tracker. I'm still not 100% sure if this is Oboe bug, AAudio bug, or bug in my own code.

My app uses Oboe with BT SCO headset (AirPods) and creates two oboe streams (input and output). It also subscribes AudioManager events (from Java SDK), and when device disappears, it calls stream->stop(), then stream->close(), and then destroys stream; first for one stream, and then for another.

(I'm using shared stream with data and error callbacks).

At the same time, concurrently with the second stream close call, onAudioDeviceUpdate() callback is invoked somewhere in background. Then close() returns and I destroy the stream. And then AudioStreamLegacy segfaults in background. Before the segfault, clang UB sanitizer reports integrity check failure.

It seems to me that close() doesn't wait until onAudioDeviceUpdate() is finished, or onAudioDeviceUpdate() doesn't check that the stream is already closed, or it fails if the stream is being closed during its invocation, or it invokes something (error callback?) that is not being waited by close(), etc.

Here are the logs:

07-21 16:49:17.199  19797    19797                              MYAPP  I  stopping stream for device 35
07-21 16:49:17.199  19797    19797                             AAudio  D  AAudioStream_requestStop(0x6d948d2420)
07-21 16:49:17.200  20099    20130                       AudioManager  E  updatePortConfig port not found for handle: 31
07-21 16:49:17.200   2302     2958                       AudioManager  E  updatePortConfig port not found for handle: 31
07-21 16:49:17.202  19797    20238                        AudioRecord  I  AUDIO_INPUT_FLAG_FAST successful; frameCount 4096 -> 4096
07-21 16:49:17.212  20099    20130                       AudioManager  E  updatePortConfig port not found for handle: 31
07-21 16:49:17.212   2302     2958                       AudioManager  E  updatePortConfig port not found for handle: 31
07-21 16:49:17.228   2302     2958                       AudioManager  E  updatePortConfig port not found for handle: 31
07-21 16:49:17.228  20099    20130                       AudioManager  E  updatePortConfig port not found for handle: 31
07-21 16:49:17.229  19797    19822                       AudioManager  E  updatePortConfig port not found for handle: 31
07-21 16:49:17.437   2302     2958                       AudioManager  E  updatePortConfig port not found for handle: 31
07-21 16:49:17.437  19797    19822                       AudioManager  E  updatePortConfig port not found for handle: 31
07-21 16:49:17.438  20099    20130                       AudioManager  E  updatePortConfig port not found for handle: 31
07-21 16:49:17.439  19797    19797                              MYAPP  I  closing stream for device 35
07-21 16:49:17.439  19797    19797                             AAudio  D  AAudioStream_close(0x6d948d2420) called ---------------
07-21 16:49:17.439  19797    19863                  AudioStreamLegacy  D  onAudioDeviceUpdate() deviceId 14
07-21 16:49:17.439  19797    19797                       AAudioStream  D  destroying 0x6d948d2420, state = AAUDIO_STREAM_STATE_CLOSED
07-21 16:49:17.439  19797    19863                  AudioStreamLegacy  D  onAudioDeviceUpdate() DISCONNECT the stream now
07-21 16:49:17.439  19797    19797                             AAudio  D  AAudioStream_close(0x6d948d2420) returned 0 ---------
07-21 16:49:17.439  19797    19797                              MYAPP  I  stopping stream for device 31
07-21 16:49:17.439  19797    19797                             AAudio  D  AAudioStream_requestStop(0x6c948d4f80)
07-21 16:49:17.439  19797    19797                         AudioTrack  D  stop() called with 772032 frames delivered
07-21 16:49:17.441  19797    19797                              MYAPP  I  closing stream for device 31
07-21 16:49:17.441  19797    19797                             AAudio  D  AAudioStream_close(0x6c948d4f80) called ---------------
07-21 16:49:17.442  19792    19792                            wrap.sh  I  frameworks/av/media/libaaudio/src/legacy/AudioStreamLegacy.cpp:216:13: runtime error: control flow integrity check for type 'aaudio::AudioStreamLegacy' failed during non-virtual call (vtable address 0x005e8dbd52f0)
07-21 16:49:17.442  19792    19792                            wrap.sh  I  0x005e8dbd52f0: note: invalid vtable
07-21 16:49:17.442  19792    19792                            wrap.sh  I  00 00 00 00  58 de b9 8d 5e 00 00 00  40 cf b9 8d 5e 00 00 00  fc 9d 47 18 7f 00 00 00  34 4c bc 8d
07-21 16:49:17.442  19792    19792                            wrap.sh  I                ^ 
07-21 16:49:17.443  19797    19797                         AudioTrack  V  ~AudioTrack, releasing session id 25 from 19797 on behalf of 19797
07-21 16:49:17.446  19797    19797                       AAudioStream  D  destroying 0x6c948d4f80, state = AAUDIO_STREAM_STATE_CLOSED
07-21 16:49:17.446  19797    19797                             AAudio  D  AAudioStream_close(0x6c948d4f80) returned 0 ---------
07-21 16:49:17.446  19797    19797                              MYAPP  D  oboe stream destructor
07-21 16:49:17.446  19797    19797                              MYAPP  D  oboe stream destructor

Relevant lines are:

<<<< onAudioDeviceUpdate() is called in background >>>>

07-21 16:49:17.439  19797    19863                  AudioStreamLegacy  D  onAudioDeviceUpdate() DISCONNECT the stream now
07-21 16:49:17.439  19797    19797                             AAudio  D  AAudioStream_close(0x6d948d2420) returned 0 ---------

<<<< my app calls stop() >>>>

07-21 16:49:17.439  19797    19797                              MYAPP  I  stopping stream for device 31
07-21 16:49:17.439  19797    19797                             AAudio  D  AAudioStream_requestStop(0x6c948d4f80)
07-21 16:49:17.439  19797    19797                         AudioTrack  D  stop() called with 772032 frames delivered

<<<< my app calls close() >>>>

07-21 16:49:17.441  19797    19797                              MYAPP  I  closing stream for device 31
07-21 16:49:17.441  19797    19797                             AAudio  D  AAudioStream_close(0x6c948d4f80) called ---------------

<<<< sanitizer failure >>>>

07-21 16:49:17.442  19792    19792                            wrap.sh  I  frameworks/av/media/libaaudio/src/legacy/AudioStreamLegacy.cpp:216:13: runtime error: control flow integrity check for type 'aaudio::AudioStreamLegacy' failed during non-virtual call (vtable address 0x005e8dbd52f0)

And here is the backtrace:

 wrap.sh  I  SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior frameworks/av/media/libaaudio/src/legacy/AudioStreamLegacy.cpp:216:13 in
   DEBUG  F  *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
   DEBUG  F  Build fingerprint: 'xiaomi/lavender/lavender:9/PKQ1.180904.001/V11.0.6.0.PFGMIXM:user/release-keys'
   DEBUG  F  Revision: '0'
   DEBUG  F  ABI: 'arm64'
   DEBUG  F  pid: 19797, tid: 19863, name: Binder:19797_4  >>> tech.boring.boringaudio <<<
   DEBUG  F  signal 6 (SIGABRT), code -6 (SI_TKILL), fault addr --------
   DEBUG  F  x0  0000000000000000  x1  0000000000004d97  x2  0000000000000006  x3  0000000000000008
   DEBUG  F  x4  0000000000000000  x5  0000000000000000  x6  0000000000000000  x7  0000000000000000
   DEBUG  F  x8  0000000000000083  x9  727d373c9ee44947  x10 0000000000000000  x11 fffffffc7ffffbdf
   DEBUG  F  x12 0000000000000001  x13 0000005e76454270  x14 0000000000000000  x15 0000007f16a59b5c
   DEBUG  F  x16 0000007f19d942b0  x17 0000007f19cb3cd0  x18 0000000000000001  x19 0000000000004d55
   DEBUG  F  x20 0000000000004d97  x21 0000000820810411  x22 0000005e8dbd52e0  x23 0000005e76455588
   DEBUG  F  x24 0000007094916480  x25 0000007f168772c0  x26 0000007f168fc680  x27 0000007f168ee930
   DEBUG  F  x28 000000000000dd50  x29 0000005e76454ed0
   DEBUG  F  sp  0000005e76454e90  lr  0000007f19ca7144  pc  0000007f19ca716c
   DEBUG  F  backtrace:
   DEBUG  F  #00 pc 000000000002216c  /system/lib64/libc.so (abort+116)
   DEBUG  F  #01 pc 000000000005a6d8  /data/app/tech.boring.boringaudio-jDFB_zBuiZEygtRlTZq43g==/lib/arm64/libclang_rt.asan-aarch64-android.so (offset 0x50000)
   DEBUG  F  #02 pc 0000000000059670  /data/app/tech.boring.boringaudio-jDFB_zBuiZEygtRlTZq43g==/lib/arm64/libclang_rt.asan-aarch64-android.so (offset 0x50000)
   DEBUG  F  #03 pc 00000000000640c4  /data/app/tech.boring.boringaudio-jDFB_zBuiZEygtRlTZq43g==/lib/arm64/libclang_rt.asan-aarch64-android.so (offset 0x50000) (__ubsan_handle_cfi_check_fail_abort+128)
   DEBUG  F  #04 pc 000000000002606c  /system/lib64/libaaudio.so
   DEBUG  F  #05 pc 000000000002a574  /system/lib64/libaaudio.so (__cfi_check+5492)
   DEBUG  F  #06 pc 000000000002ee58  /system/lib64/libaaudio.so (aaudio::AudioStreamLegacy::onAudioDeviceUpdate(int)+856)
   DEBUG  F  #07 pc 000000000005d3e8  /system/lib64/libaudioclient.so (android::AudioRecord::onAudioDeviceUpdate(int, int)+296)
   DEBUG  F  #08 pc 0000000000064ffc  /system/lib64/libaudioclient.so (android::AudioSystem::AudioFlingerClient::ioConfigChanged(android::audio_io_config_event, android::sp<android::AudioIoDescriptor> const&)+3300)
   DEBUG  F  #09 pc 0000000000086140  /system/lib64/libaudioclient.so (android::BnAudioFlingerClient::onTransact(unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+360)
   DEBUG  F  #10 pc 000000000004fbf4  /system/lib64/libbinder.so (android::BBinder::transact(unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+136)
   DEBUG  F  #11 pc 000000000005d2a4  /system/lib64/libbinder.so (android::IPCThreadState::executeCommand(int)+520)
   DEBUG  F  #12 pc 000000000005cfe8  /system/lib64/libbinder.so (android::IPCThreadState::getAndExecuteCommand()+156)
   DEBUG  F  #13 pc 000000000005d6d8  /system/lib64/libbinder.so (android::IPCThreadState::joinThreadPool(bool)+108)
   DEBUG  F  #14 pc 000000000007fc5c  /system/lib64/libbinder.so (android::PoolThread::threadLoop()+24)
   DEBUG  F  #15 pc 00000000000100dc  /system/lib64/libutils.so (android::Thread::_threadLoop(void*)+284)
   DEBUG  F  #16 pc 00000000000b5a10  /system/lib64/libandroid_runtime.so (android::AndroidRuntime::javaThreadShell(void*)+140)
   DEBUG  F  #17 pc 0000000000090328  /system/lib64/libc.so (__pthread_start(void*)+36)
   DEBUG  F  #18 pc 0000000000023a28  /system/lib64/libc.so (__start_thread+68)
 wrap.sh  I  Aborted
 wrap.sh  I  wrap.sh terminated by exit(134)

My own code looks sane to me: I just call close() and expect that it will wait everything that should be waited, and when it returns, I destroy the stream.

If my code is correct, I'm not sure whether it's related to Oboe or AAudio. Is there a chance that Oboe close() implementation is incomplete and doesn't wait for something?

Currently the reproducer is a rather large closed-source app. If my expectations to close() are correct and you think I'm using it right, and this may be related to Oboe, I can try to create a stand-alone reproducer. Please let me know if this makes sense.

philburk commented 4 years ago

This sounds like #820. Probably a "use after free". To fix this we added openStream(shared_ptr<>). See release notes for 1.4.2: https://github.com/google/oboe/releases/tag/1.4.2

std::shared_ptr<oboe::AudioStream> oboeStream; // hold onto this, maybe in an instance variable

Result result = builder.openStream(oboeStream);
gavv commented 4 years ago

@philburk Actually I already experienced #820 before this issue, found your PR, and fixed it by updating to oboe 1.4.2 and using shared stream. And this one is some another crash with different backtrace. And it's reproducing when using shared stream.

gavv commented 4 years ago

After doing more tests, I've encountered another, probably related, crash:

https://gist.github.com/gavv/b4222d9600d6ac3dab95a37e3b26c96d

Previously, I was disabling Bluetooth or powering off BT headset, my app was receiving device removal event, and was trying to close oboe streams immediately, and crashed.

This time I did a bit different test. I was stopping/starting BT SCO and oboe streams repeatedly, and after one of the restarts headset disappeared (likely because "AudioFlinger server died" from logs). It happens from time to time, I don't know why, but I'm trying to make the app robust to this situation.

So Bluetooth was enabled, BT SCO was enabled, headset was enabled, but it disappeared from device list temporary and my app received device removal event.

This time, instead of trying to close oboe streams immediately, my app waited for 3 seconds until oboe streams would report failure. However that didn't happen and after 3 seconds it started closing the streams.

It stopped and closed both streams, released shared pointers, and destroyed my AudioStreamCallback implementation. Then the app disabled BT SCO.

Then onErrorBeforeClose callback was invoked on a destroyed object, and it crashed.

Maybe the callback was triggered by disabling BT SCO, or maybe it was triggered but earlier call to onAudioDeviceUpdate() during close, I'm not sure (see logs).

gavv commented 4 years ago

The reason why I think the second crash is related is that in both crashes there are onAudioDeviceUpdate() calls, and sanitizer complains when we're still inside that function, before my code is even called.

gavv commented 4 years ago

I've tried to delay deletion of my AudioStreamCallback implementation and oboe stream. Now, after closing oboe stream, I don't delete them immediately, but instead schedule deletion after 5 minutes.

With this approach, sanitizer doesn't complain for my own code any more, but it still sometimes complains for errors in AudioStreamLegacy.cpp. A bit later after the complaint, I'm getting yet another crash, maybe caused by memory corruption.

07-22 17:19:55.208  19891    19891                       AudioManager  I  In stopBluetoothSco(), calling application: tech.boring.boringaudio
07-22 17:19:55.208   2302     3561                       AudioService  I  In stopBluetoothSco()
07-22 17:19:55.208   2302     3561                       AudioService  I  In decCount(), mStartcount: 1
07-22 17:19:55.209   2302     3561                       AudioService  I  In requestScoState(), state: 10, scoAudioMode: 0
07-22 17:19:55.209   2302     3561                       AudioService  I  In checkScoAudioState(), mScoAudioState: 3
07-22 17:19:55.209   2302     3561                       AudioService  I  In totalCount(), count: 0
07-22 17:19:55.209   2302     3561                       AudioService  I  In disconnectBluetoothScoAudioHelper(), scoAudioMode: 0, bluetoothHeadset: android.bluetooth.BluetoothHeadset@e56199e, BluetoothDevice: BC:B8:63:46:6E:85
07-22 17:19:55.209   2302     3561                       AudioService  I  In disconnectBluetoothScoAudioHelper(), calling stopScoUsingVirtualVoiceCall()
07-22 17:19:55.211  19891    20010                          OboeAudio  D  oboe_aaudio_error_thread_proc() - entering >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

<<< here I receive a callback on oboe stream for which I already called close() and it returned >>>

07-22 17:19:55.211  19891    20010                   BoringAudio:Core  D  ignoring onErrorBeforeClose() callback: stream is closed
07-22 17:19:55.211  19891    20010                          OboeAudio  D  oboe_aaudio_error_thread_proc() - exiting <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
07-22 17:19:55.212  19886    19886                            wrap.sh  I  frameworks/av/media/libaaudio/src/legacy/AudioStreamLegacy.cpp:45:39: runtime error: control flow integrity check for type 'aaudio::AudioStreamLegacy' failed during cast to unrelated type (vtable address 0x007c3c00004c)
07-22 17:19:55.212  19886    19886                            wrap.sh  I  0x007c3c00004c: note: invalid vtable
07-22 17:19:55.212  19886    19886                            wrap.sh  I  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
07-22 17:19:55.212  19886    19886                            wrap.sh  I                ^
07-22 17:19:55.214  19891    19891                       AudioManager  I  In setBluetoothScoOn(), on: false, calling application: tech.boring.boringaudio
07-22 17:19:55.214   2302     7602                       AudioService  D  setBluetoothScoOn Permission Denial for pid = 19891, uid = 10338
07-22 17:19:55.214   2302     7602                       AudioService  I  In setBluetoothScoOn(), on: false. The calling application Uid: 10338, is greater than FIRST_APPLICATION_UID exiting from setBluetoothScoOn()
07-22 17:19:55.214   2302     7602                       AudioService  I  In setbluetoothScoOn(), mForcedUseForCommExt: 0
07-22 17:19:55.265  19886    19886                            wrap.sh  I  frameworks/av/media/libaaudio/src/legacy/AudioStreamLegacy.cpp:45:39: note: check failed in /system/lib64/libaaudio.so, vtable located in [anon:cfi shadow]
07-22 17:19:55.265  19886    19886                            wrap.sh  I  SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior frameworks/av/media/libaaudio/src/legacy/AudioStreamLegacy.cpp:45:39 in

................................................

07-22 17:19:55.593  19891    19891                         AudioTrack  V  Building AudioTrack with attributes: usage=2 content=1 flags=0x0 tags=[]
07-22 17:19:55.611  20015    20015                              DEBUG  F  *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
07-22 17:19:55.612  20015    20015                              DEBUG  F  Build fingerprint: 'xiaomi/lavender/lavender:9/PKQ1.180904.001/V11.0.6.0.PFGMIXM:user/release-keys'
07-22 17:19:55.612  20015    20015                              DEBUG  F  Revision: '0'
07-22 17:19:55.612  20015    20015                              DEBUG  F  ABI: 'arm64'
07-22 17:19:55.612  20015    20015                              DEBUG  F  pid: 19891, tid: 19934, name: AudioTrack  >>> tech.boring.boringaudio <<<
07-22 17:19:55.612  20015    20015                              DEBUG  F  signal 6 (SIGABRT), code -6 (SI_TKILL), fault addr --------
07-22 17:19:55.612  20015    20015                              DEBUG  F  x0  0000000000000000  x1  0000000000004dde  x2  0000000000000006  x3  0000000000000008
07-22 17:19:55.612  20015    20015                              DEBUG  F  x4  0000000000000000  x5  0000000000000000  x6  0000000000000000  x7  0000000000000000
07-22 17:19:55.612  20015    20015                              DEBUG  F  x8  0000000000000083  x9  dcb884734a20c6f6  x10 0000000000000000  x11 fffffffc7ffffbdf
07-22 17:19:55.612  20015    20015                              DEBUG  F  x12 0000000000000001  x13 0000007bf9b9f4e0  x14 0000000000000000  x15 0000007c97939b5c
07-22 17:19:55.612  20015    20015                              DEBUG  F  x16 0000007c9a9332b0  x17 0000007c9a852cd0  x18 0000000000000000  x19 0000000000004db3
07-22 17:19:55.612  20015    20015                              DEBUG  F  x20 0000000000004dde  x21 5fc5d1b81fb5fcd3  x22 0000007c9cc89010  x23 000000000000e770
07-22 17:19:55.612  20015    20015                              DEBUG  F  x24 0000007c99daa7b2  x25 0000007c99dab61e  x26 0000000000000540  x27 0000000000000300
07-22 17:19:55.612  20015    20015                              DEBUG  F  x28 0000007c101cc15c  x29 0000007bf9ba0140
07-22 17:19:55.612  20015    20015                              DEBUG  F  sp  0000007bf9ba0100  lr  0000007c9a846144  pc  0000007c9a84616c
07-22 17:19:55.621  20015    20015                              DEBUG  F  backtrace:
07-22 17:19:55.621  20015    20015                              DEBUG  F  #00 pc 000000000002216c  /system/lib64/libc.so (abort+116)
07-22 17:19:55.621  20015    20015                              DEBUG  F  #01 pc 000000000005a6d8  /data/app/tech.boring.boringaudio-ISJ-bdL0R5Mrp4TqRNudsQ==/lib/arm64/libclang_rt.asan-aarch64-android.so (offset 0x50000)
07-22 17:19:55.621  20015    20015                              DEBUG  F  #02 pc 0000000000059670  /data/app/tech.boring.boringaudio-ISJ-bdL0R5Mrp4TqRNudsQ==/lib/arm64/libclang_rt.asan-aarch64-android.so (offset 0x50000)
07-22 17:19:55.621  20015    20015                              DEBUG  F  #03 pc 00000000000640c4  /data/app/tech.boring.boringaudio-ISJ-bdL0R5Mrp4TqRNudsQ==/lib/arm64/libclang_rt.asan-aarch64-android.so (offset 0x50000) (__ubsan_handle_cfi_check_fail_abort+128)
07-22 17:19:55.621  20015    20015                              DEBUG  F  #04 pc 000000000002606c  /system/lib64/libaaudio.so
07-22 17:19:55.621  20015    20015                              DEBUG  F  #05 pc 000000000002a574  /system/lib64/libaaudio.so (__cfi_check+5492)
07-22 17:19:55.621  20015    20015                              DEBUG  F  #06 pc 0000000000024584  /system/bin/linker64 (__dl__ZN15CFIShadowWriter7CfiFailEmPvS0_S0_+128)
07-22 17:19:55.621  20015    20015                              DEBUG  F  #07 pc 0000000000001140  /system/lib64/libdl.so (__cfi_slowpath_diag+64)
07-22 17:19:55.621  20015    20015                              DEBUG  F  #08 pc 000000000002f820  /system/lib64/libaaudio.so (AudioStreamLegacy_callback(int, void*, void*) [clone .cfi]+272)
07-22 17:19:55.621  20015    20015                              DEBUG  F  #09 pc 000000000007887c  /system/lib64/libaudioclient.so (android::AudioTrack::processAudioBuffer()+4000)
07-22 17:19:55.621  20015    20015                              DEBUG  F  #10 pc 00000000000775cc  /system/lib64/libaudioclient.so (android::AudioTrack::AudioTrackThread::threadLoop()+252)
07-22 17:19:55.621  20015    20015                              DEBUG  F  #11 pc 0000000000010074  /system/lib64/libutils.so (android::Thread::_threadLoop(void*)+180)
07-22 17:19:55.621  20015    20015                              DEBUG  F  #12 pc 0000000000090328  /system/lib64/libc.so (__pthread_start(void*)+36)
07-22 17:19:55.621  20015    20015                              DEBUG  F  #13 pc 0000000000023a28  /system/lib64/libc.so (__start_thread+68)

I think it confirms that free-after-use or similar issue happens with some lower-level object, not with my callback and not with oboe stream.

Maybe onAudioDeviceUpdate() is not properly synchronized?

BTW note that I got all these crashes during quite intensive "stress testing" - I randomly and repeatedly stopped and started my stream, Bluetooth on phone, and the headset (by putting AirPods into the box and getting them back). All these crashes are rare enough.

Also note that every time I stop the stream, I also stop BT SCO, and then I start it again before starting the stream. Maybe it's not optimal, but for now I keep this code because probably it helps to reproduce the crashes (I don't know).

philburk commented 4 years ago

It stopped and closed both streams, released shared pointers, and destroyed my AudioStreamCallback implementation. Then the app disabled BT SCO. Then onErrorBeforeClose callback was invoked on a destroyed object, and it crashed.

Are you saying that onErrorBeforeClose callback was invoked by a stream that had already been closed! That should not happen and would be a big problem. Have you only seen this on Android 9? Which devices and versions show this problem?

philburk commented 4 years ago

@gavv - I am trying to reproduce this error on P using a BT headset.

Do you think reproducing the crash requires opening both INPUT and OUTPUT streams.

my app was receiving device removal event,

Were you using AudioManager.registerAudioDeviceCallback() for that?

philburk commented 4 years ago

@gavv - When are you calling startBluetoothSCO() in relation to opening the stream?

How are you choosing the SCO device? Are you letting Oboe/AAudio select the device or do you look for a SCO device in the AudioDeviceInfo array?

gavv commented 4 years ago

Are you saying that onErrorBeforeClose callback was invoked by a stream that had already been closed!

It seems yes, at least in the last (third) crash.

Have you only seen this on Android 9? Which devices and versions show this problem?

So far I tested it only on one phone (Android 9, Xiaomi Redmi Note 7) on which I'm doing initial development of the app.

If it helps, I can also test it on Android 8 if it helps. For now I don't have access to other devices.

Do you think reproducing the crash requires opening both INPUT and OUTPUT streams.

I'm always using two streams, but I'm not sure if it's important. I can test it with one stream if it helps.

Were you using AudioManager.registerAudioDeviceCallback() for that?

Yes.

When are you calling startBluetoothSCO() in relation to opening the stream?

I call startBluetoothSco, wait SCO_AUDIO_STATE_CONNECTED event, then create and start oboe streams.

When I receive device removal event, I tried several approaches:

1) first crash: immediately stop, close, and destroy oboe streams, then call stopBluetoothSco() and setBluetoothScoOn(false)

2) second crash: wait for onErrorBeforeClose with ErrorDisconnected; wait until my callback returns; stop, close, and destroy oboe streams; then call stopBluetoothSco() and setBluetoothScoOn(false)

3) third crash: wait for onErrorBeforeClose with ErrorDisconnected; wait until my callback returns; stop and close oboe streams; don't destroy them; then call stopBluetoothSco() and setBluetoothScoOn(false)

How are you choosing the SCO device? Are you letting Oboe/AAudio select the device or do you look for a SCO device in the AudioDeviceInfo array?

I get IDs from AudioDeviceInfo and pass them to oboe. You can see those IDs in logs above.

gavv commented 4 years ago

Thanks for looking into this! I'll be offline (mostly) a few days. I can provide more info or prepare a reproducer on next week.

philburk commented 4 years ago

Thanks! Those details are very helpful. I am now seeing a crash that may be related.

Also please, what are you using for sampleRate, channelCount and inputPreset?

philburk commented 4 years ago

Note that setBluetoothScoOn() "should only be used by applications that replace the platform-wide management of audio settings or the main telephony application." Is that true for your app?

https://developer.android.com/reference/android/media/AudioManager#setBluetoothScoOn(boolean)

philburk commented 4 years ago

I have been doing a lot of testing on this and found that if a legacy AAudio stream is not stopped then a data callback, and possibly an error callback, can occur after the stream has been closed and deleted. This is more likely to occur when a routing change is occurring.

If an app is closing and restarting streams from two threads, one Java and one triggered by an error callback, then a race condition in the application could cause this error. Consider the following sequence:

JNI Thread , Error Callback

stop()
close()
open()
               stop()
start()
               close()
               late callback => crash!

The stream is running when the close is called, causing the crash.

To avoid this problem, the app can make sure the stop and close occur together by putting them all under a single lock:

{
    scoped lock
    stop()
    close()
    open()
    start()
}

The app may wish to do all stream management under a single thread. The Java code and the error callback can send messages to that thread so that everything is coherent. I will provide an example of this. #960

In Oboe, we can force a stop from the close() method under a lock.

gavv commented 4 years ago

Sorry for delay, I'm back.

Also please, what are you using for sampleRate, channelCount and inputPreset?

I'm using device-native parameters:

playback stream: device=30 api=AAudio dir=Output format=I16 channelCount=2 sampleRate=48000
sharingMode=Shared perfMode=LowLatency usage=VoiceCommunication contentType=Speech
inputPreset=VoiceRecognition samplesPerCallback=0
recording stream: device=29 api=AAudio dir=Input format=I16 channelCount=2 sampleRate=48000
sharingMode=Shared perfMode=LowLatency usage=VoiceCommunication contentType=Speech
inputPreset=Unprocessed samplesPerCallback=0

Note that setBluetoothScoOn() "should only be used by applications that replace the platform-wide management of audio settings or the main telephony application." Is that true for your app?

Oops, setBluetoothScoOn() is unneeded here. I removed it and nothing changed. I only called setBluetoothScoOn(false) after stopBluetoothSco(), so maybe it had no effect at all.

gavv commented 4 years ago

If an app is closing and restarting streams from two threads, one Java and one triggered by an error callback, then a race condition in the application could cause this error. Consider the following sequence:

Hmm.

Here is how I was closing streams in, for example, third test above:

  1. receive AudioManager device removal event;
  2. wait for onErrorBeforeClose() for 3 seconds; it's not called;
  3. lock mutex;
  4. stop first stream;
  5. close first stream;
  6. stop second stream;
  7. close second stream;
  8. don't destroy streams;
  9. unlock mutex;
  10. call stopBluetoothSco, wait a bit;
  11. onErrorBeforeClose is called on closed stream;

All manipulations with oboe streams, except data callback, are performed under a mutex. The mutex protects both streams. I always stop and close both streams and only then open two new streams.

Also, it seems that the crash happens before creating and opening new streams. The stream is stopped and closed, then the error callback is called on closed stream, and I didn't try to create a new one yet.

gavv commented 4 years ago

@philburk I've prepared a minimal app reproducing the problem. Although it's rather small, my customer is more comfortable with not making it fully public, so I've pushed it to a private repo on github and sent you an invite. Hope this is OK.

The app has two buttons, start and stop. Start button selects device IDs, starts BT SCO, and creates two oboe streams. Stop button closes and stops oboe streams and stops BT SCO.

The app never destroys oboe streams. This memory leak is intentional. It allows us to print a log message when callback is called after close, instead of crashing. The app still crashes, but not in our code.

Also, the app monitors device manager events and automatically invokes start when a BT SCO device appears and stop when it disappears. Actually, to reproduce the problem, you don't even need to press start and stop buttons manually. They are needed just for convenience.

I can reproduce two variations of the problem using this example app:

  1. A crash in onAudioDeviceUpdate(). Happens rather frequently on my device.
  2. A call to onErrorBeforeClose() on closed stream and then a crash in onAudioDeviceUpdate(). Happens not so frequently, but I think this is just a special case of the first issue.

This github gist provides the logs for both issues: https://gist.github.com/gavv/5eda6ae760c2eb979c34efb0e64eb272

For callback-after-close issue, search for these lines:

08-01 21:32:25.259 21896 22315 D ExampleApp: got onErrorBeforeClose() on input stream for device 1276 with error ErrorDisconnected
08-01 21:32:25.259 21896 22315 D ExampleApp: !!!!!!!! got error callback on closed stream !!!!!!!!

Here is how to reproduce the problem:

  1. open app
  2. enable bluetooth in quick settings panel and wait until headset is connected, the app detects it, and starts oboe streams
  3. disable bluetooth in quick settings panel, or alternatively disable headset (by putting airpods into the box in my case), and wait until the app detects it and stops and closes oboe streams
  4. if the app didn't crash, go to step 2 and repeat

Eventually, the app will crash after step 3 and you'll see a crash and possibly also callback-after-close message in logs.

When I keep the call to randomDelay() in MainActivity commented out, I get a crash once per 5-10 iterations, but never get callback-after-close. When I uncomment randomDelay() in MainActivity, I get callback-after-close once per 15-20 iterations.

To build and deploy the app, you can just run build.sh and deploy.sh. The build script uses docker to freeze versions of SDK and NDK. The very first build is very slow.

The app comes with clang UB sanitizer enabled. I'm using the debug version of the APK.

philburk commented 4 years ago

Thank you! Being able to reproduce the problem is very important. I have found the invitation and am looking at the repo. I really appreciate this.

philburk commented 4 years ago

Thanks gavv. I am able to reproduce the crash consistently with your test program when running P. I tried many times to crash on the current master and on QQ3A and it did not crash.

I could only reproduce this when using Bluetooth

I then tried calling usleep(N*1000) before the stream is closed. N. crashes:tries 10 => 0:10 5 => 0:10 1 => 1:10 in onDeviceUpdate() no => 8:11

So it is a pretty tight race. It seems a short delay right before the close is enough to prevent the race. Five msec works OK but ten msec should be a safer value. I will add these delays to Oboe.

gavv commented 4 years ago

Thanks! So, presumably, this is indeed a race in AAudio? And we don't know whether it's fixed or just not reproducing on master, right?

Should we report it? I'm not familiar with the source code, but after quick look, I didn't find a place where AudioStreamLegacy waits until onDeviceUpdate is exited and unregistered, before closing the stream.

philburk commented 4 years ago

And we don't know whether it's fixed or just not reproducing on master, right?

Right. That is the important question. I am looking at old bugs and change logs for this. It is being tracked internally as b/161914201

I didn't find a place where AudioStreamLegacy waits until onDeviceUpdate is exited and unregistered, before closing the stream.

It is unregistered in AudioStreamRecord::release_l(). That is only for the error callback. There is also the issue of the data callback. I am looking at refactoring this code.

philburk commented 4 years ago

See PR #970

gavv commented 4 years ago

Related crash on Android 10 (on same device after update): https://gist.github.com/gavv/b0f21f0137271534e072603a1630ce02

Also in onAudioDeviceUpdate(), this time with sanitizers disabled.

Happened after switching BT off/on.

Patch from #970 was NOT applied.

philburk commented 3 years ago

Thanks again for this report. We have identified several sources of late callbacks. One is that AudioTrack only joins it threads in its destructor. We have addressed these problems in R and S. Luckily these conditions can be avoid by forcing a stop() and sleeping for a few msec before the close.

rpattabi commented 3 years ago

Luckily these conditions can be avoid by forcing a stop() and sleeping for a few msec before the close.

Is this something app is expected to do prior to calling close()? That is, prior to calling close(), are we expected to call stop() and sleep for a few milliseconds? We thought oboe was doing this already.

gavv commented 3 years ago

Thanks again for this report. We have identified several sources of late callbacks. One is that AudioTrack only joins it threads in its destructor. We have addressed these problems in R and S.

Great, thanks!!

Is this something app is expected to do prior to calling close()? That is, prior to calling close(), are we expected to call stop() and sleep for a few milliseconds? We thought oboe was doing this already.

As far as I understand, this is already addressed by oboe in #967. @philburk Right?

philburk commented 3 years ago

Is this something app is expected to do prior to calling close()?

In Oboe 1.5.0, in the close() method we issue a stop and also pause for a few msec. If the stream is already stopped then the extra stop is a NOOP.

philburk commented 4 months ago

b/161914201 has been fixed with three CLs in Android R.