Closed gavv closed 3 years ago
This sounds like #820. Probably a "use after free". To fix this we added openStream(shared_ptr<>). See release notes for 1.4.2: https://github.com/google/oboe/releases/tag/1.4.2
std::shared_ptr<oboe::AudioStream> oboeStream; // hold onto this, maybe in an instance variable
Result result = builder.openStream(oboeStream);
@philburk Actually I already experienced #820 before this issue, found your PR, and fixed it by updating to oboe 1.4.2 and using shared stream. And this one is some another crash with different backtrace. And it's reproducing when using shared stream.
After doing more tests, I've encountered another, probably related, crash:
https://gist.github.com/gavv/b4222d9600d6ac3dab95a37e3b26c96d
Previously, I was disabling Bluetooth or powering off BT headset, my app was receiving device removal event, and was trying to close oboe streams immediately, and crashed.
This time I did a bit different test. I was stopping/starting BT SCO and oboe streams repeatedly, and after one of the restarts headset disappeared (likely because "AudioFlinger server died" from logs). It happens from time to time, I don't know why, but I'm trying to make the app robust to this situation.
So Bluetooth was enabled, BT SCO was enabled, headset was enabled, but it disappeared from device list temporary and my app received device removal event.
This time, instead of trying to close oboe streams immediately, my app waited for 3 seconds until oboe streams would report failure. However that didn't happen and after 3 seconds it started closing the streams.
It stopped and closed both streams, released shared pointers, and destroyed my AudioStreamCallback implementation. Then the app disabled BT SCO.
Then onErrorBeforeClose callback was invoked on a destroyed object, and it crashed.
Maybe the callback was triggered by disabling BT SCO, or maybe it was triggered but earlier call to onAudioDeviceUpdate() during close, I'm not sure (see logs).
The reason why I think the second crash is related is that in both crashes there are onAudioDeviceUpdate() calls, and sanitizer complains when we're still inside that function, before my code is even called.
I've tried to delay deletion of my AudioStreamCallback implementation and oboe stream. Now, after closing oboe stream, I don't delete them immediately, but instead schedule deletion after 5 minutes.
With this approach, sanitizer doesn't complain for my own code any more, but it still sometimes complains for errors in AudioStreamLegacy.cpp. A bit later after the complaint, I'm getting yet another crash, maybe caused by memory corruption.
07-22 17:19:55.208 19891 19891 AudioManager I In stopBluetoothSco(), calling application: tech.boring.boringaudio
07-22 17:19:55.208 2302 3561 AudioService I In stopBluetoothSco()
07-22 17:19:55.208 2302 3561 AudioService I In decCount(), mStartcount: 1
07-22 17:19:55.209 2302 3561 AudioService I In requestScoState(), state: 10, scoAudioMode: 0
07-22 17:19:55.209 2302 3561 AudioService I In checkScoAudioState(), mScoAudioState: 3
07-22 17:19:55.209 2302 3561 AudioService I In totalCount(), count: 0
07-22 17:19:55.209 2302 3561 AudioService I In disconnectBluetoothScoAudioHelper(), scoAudioMode: 0, bluetoothHeadset: android.bluetooth.BluetoothHeadset@e56199e, BluetoothDevice: BC:B8:63:46:6E:85
07-22 17:19:55.209 2302 3561 AudioService I In disconnectBluetoothScoAudioHelper(), calling stopScoUsingVirtualVoiceCall()
07-22 17:19:55.211 19891 20010 OboeAudio D oboe_aaudio_error_thread_proc() - entering >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
<<< here I receive a callback on oboe stream for which I already called close() and it returned >>>
07-22 17:19:55.211 19891 20010 BoringAudio:Core D ignoring onErrorBeforeClose() callback: stream is closed
07-22 17:19:55.211 19891 20010 OboeAudio D oboe_aaudio_error_thread_proc() - exiting <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
07-22 17:19:55.212 19886 19886 wrap.sh I frameworks/av/media/libaaudio/src/legacy/AudioStreamLegacy.cpp:45:39: runtime error: control flow integrity check for type 'aaudio::AudioStreamLegacy' failed during cast to unrelated type (vtable address 0x007c3c00004c)
07-22 17:19:55.212 19886 19886 wrap.sh I 0x007c3c00004c: note: invalid vtable
07-22 17:19:55.212 19886 19886 wrap.sh I 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
07-22 17:19:55.212 19886 19886 wrap.sh I ^
07-22 17:19:55.214 19891 19891 AudioManager I In setBluetoothScoOn(), on: false, calling application: tech.boring.boringaudio
07-22 17:19:55.214 2302 7602 AudioService D setBluetoothScoOn Permission Denial for pid = 19891, uid = 10338
07-22 17:19:55.214 2302 7602 AudioService I In setBluetoothScoOn(), on: false. The calling application Uid: 10338, is greater than FIRST_APPLICATION_UID exiting from setBluetoothScoOn()
07-22 17:19:55.214 2302 7602 AudioService I In setbluetoothScoOn(), mForcedUseForCommExt: 0
07-22 17:19:55.265 19886 19886 wrap.sh I frameworks/av/media/libaaudio/src/legacy/AudioStreamLegacy.cpp:45:39: note: check failed in /system/lib64/libaaudio.so, vtable located in [anon:cfi shadow]
07-22 17:19:55.265 19886 19886 wrap.sh I SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior frameworks/av/media/libaaudio/src/legacy/AudioStreamLegacy.cpp:45:39 in
................................................
07-22 17:19:55.593 19891 19891 AudioTrack V Building AudioTrack with attributes: usage=2 content=1 flags=0x0 tags=[]
07-22 17:19:55.611 20015 20015 DEBUG F *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
07-22 17:19:55.612 20015 20015 DEBUG F Build fingerprint: 'xiaomi/lavender/lavender:9/PKQ1.180904.001/V11.0.6.0.PFGMIXM:user/release-keys'
07-22 17:19:55.612 20015 20015 DEBUG F Revision: '0'
07-22 17:19:55.612 20015 20015 DEBUG F ABI: 'arm64'
07-22 17:19:55.612 20015 20015 DEBUG F pid: 19891, tid: 19934, name: AudioTrack >>> tech.boring.boringaudio <<<
07-22 17:19:55.612 20015 20015 DEBUG F signal 6 (SIGABRT), code -6 (SI_TKILL), fault addr --------
07-22 17:19:55.612 20015 20015 DEBUG F x0 0000000000000000 x1 0000000000004dde x2 0000000000000006 x3 0000000000000008
07-22 17:19:55.612 20015 20015 DEBUG F x4 0000000000000000 x5 0000000000000000 x6 0000000000000000 x7 0000000000000000
07-22 17:19:55.612 20015 20015 DEBUG F x8 0000000000000083 x9 dcb884734a20c6f6 x10 0000000000000000 x11 fffffffc7ffffbdf
07-22 17:19:55.612 20015 20015 DEBUG F x12 0000000000000001 x13 0000007bf9b9f4e0 x14 0000000000000000 x15 0000007c97939b5c
07-22 17:19:55.612 20015 20015 DEBUG F x16 0000007c9a9332b0 x17 0000007c9a852cd0 x18 0000000000000000 x19 0000000000004db3
07-22 17:19:55.612 20015 20015 DEBUG F x20 0000000000004dde x21 5fc5d1b81fb5fcd3 x22 0000007c9cc89010 x23 000000000000e770
07-22 17:19:55.612 20015 20015 DEBUG F x24 0000007c99daa7b2 x25 0000007c99dab61e x26 0000000000000540 x27 0000000000000300
07-22 17:19:55.612 20015 20015 DEBUG F x28 0000007c101cc15c x29 0000007bf9ba0140
07-22 17:19:55.612 20015 20015 DEBUG F sp 0000007bf9ba0100 lr 0000007c9a846144 pc 0000007c9a84616c
07-22 17:19:55.621 20015 20015 DEBUG F backtrace:
07-22 17:19:55.621 20015 20015 DEBUG F #00 pc 000000000002216c /system/lib64/libc.so (abort+116)
07-22 17:19:55.621 20015 20015 DEBUG F #01 pc 000000000005a6d8 /data/app/tech.boring.boringaudio-ISJ-bdL0R5Mrp4TqRNudsQ==/lib/arm64/libclang_rt.asan-aarch64-android.so (offset 0x50000)
07-22 17:19:55.621 20015 20015 DEBUG F #02 pc 0000000000059670 /data/app/tech.boring.boringaudio-ISJ-bdL0R5Mrp4TqRNudsQ==/lib/arm64/libclang_rt.asan-aarch64-android.so (offset 0x50000)
07-22 17:19:55.621 20015 20015 DEBUG F #03 pc 00000000000640c4 /data/app/tech.boring.boringaudio-ISJ-bdL0R5Mrp4TqRNudsQ==/lib/arm64/libclang_rt.asan-aarch64-android.so (offset 0x50000) (__ubsan_handle_cfi_check_fail_abort+128)
07-22 17:19:55.621 20015 20015 DEBUG F #04 pc 000000000002606c /system/lib64/libaaudio.so
07-22 17:19:55.621 20015 20015 DEBUG F #05 pc 000000000002a574 /system/lib64/libaaudio.so (__cfi_check+5492)
07-22 17:19:55.621 20015 20015 DEBUG F #06 pc 0000000000024584 /system/bin/linker64 (__dl__ZN15CFIShadowWriter7CfiFailEmPvS0_S0_+128)
07-22 17:19:55.621 20015 20015 DEBUG F #07 pc 0000000000001140 /system/lib64/libdl.so (__cfi_slowpath_diag+64)
07-22 17:19:55.621 20015 20015 DEBUG F #08 pc 000000000002f820 /system/lib64/libaaudio.so (AudioStreamLegacy_callback(int, void*, void*) [clone .cfi]+272)
07-22 17:19:55.621 20015 20015 DEBUG F #09 pc 000000000007887c /system/lib64/libaudioclient.so (android::AudioTrack::processAudioBuffer()+4000)
07-22 17:19:55.621 20015 20015 DEBUG F #10 pc 00000000000775cc /system/lib64/libaudioclient.so (android::AudioTrack::AudioTrackThread::threadLoop()+252)
07-22 17:19:55.621 20015 20015 DEBUG F #11 pc 0000000000010074 /system/lib64/libutils.so (android::Thread::_threadLoop(void*)+180)
07-22 17:19:55.621 20015 20015 DEBUG F #12 pc 0000000000090328 /system/lib64/libc.so (__pthread_start(void*)+36)
07-22 17:19:55.621 20015 20015 DEBUG F #13 pc 0000000000023a28 /system/lib64/libc.so (__start_thread+68)
I think it confirms that free-after-use or similar issue happens with some lower-level object, not with my callback and not with oboe stream.
Maybe onAudioDeviceUpdate() is not properly synchronized?
BTW note that I got all these crashes during quite intensive "stress testing" - I randomly and repeatedly stopped and started my stream, Bluetooth on phone, and the headset (by putting AirPods into the box and getting them back). All these crashes are rare enough.
Also note that every time I stop the stream, I also stop BT SCO, and then I start it again before starting the stream. Maybe it's not optimal, but for now I keep this code because probably it helps to reproduce the crashes (I don't know).
It stopped and closed both streams, released shared pointers, and destroyed my AudioStreamCallback implementation. Then the app disabled BT SCO. Then onErrorBeforeClose callback was invoked on a destroyed object, and it crashed.
Are you saying that onErrorBeforeClose callback was invoked by a stream that had already been closed! That should not happen and would be a big problem. Have you only seen this on Android 9? Which devices and versions show this problem?
@gavv - I am trying to reproduce this error on P using a BT headset.
Do you think reproducing the crash requires opening both INPUT and OUTPUT streams.
my app was receiving device removal event,
Were you using AudioManager.registerAudioDeviceCallback() for that?
@gavv - When are you calling startBluetoothSCO() in relation to opening the stream?
How are you choosing the SCO device? Are you letting Oboe/AAudio select the device or do you look for a SCO device in the AudioDeviceInfo array?
Are you saying that onErrorBeforeClose callback was invoked by a stream that had already been closed!
It seems yes, at least in the last (third) crash.
Have you only seen this on Android 9? Which devices and versions show this problem?
So far I tested it only on one phone (Android 9, Xiaomi Redmi Note 7) on which I'm doing initial development of the app.
If it helps, I can also test it on Android 8 if it helps. For now I don't have access to other devices.
Do you think reproducing the crash requires opening both INPUT and OUTPUT streams.
I'm always using two streams, but I'm not sure if it's important. I can test it with one stream if it helps.
Were you using AudioManager.registerAudioDeviceCallback() for that?
Yes.
When are you calling startBluetoothSCO() in relation to opening the stream?
I call startBluetoothSco, wait SCO_AUDIO_STATE_CONNECTED event, then create and start oboe streams.
When I receive device removal event, I tried several approaches:
1) first crash: immediately stop, close, and destroy oboe streams, then call stopBluetoothSco() and setBluetoothScoOn(false)
2) second crash: wait for onErrorBeforeClose with ErrorDisconnected; wait until my callback returns; stop, close, and destroy oboe streams; then call stopBluetoothSco() and setBluetoothScoOn(false)
3) third crash: wait for onErrorBeforeClose with ErrorDisconnected; wait until my callback returns; stop and close oboe streams; don't destroy them; then call stopBluetoothSco() and setBluetoothScoOn(false)
How are you choosing the SCO device? Are you letting Oboe/AAudio select the device or do you look for a SCO device in the AudioDeviceInfo array?
I get IDs from AudioDeviceInfo and pass them to oboe. You can see those IDs in logs above.
Thanks for looking into this! I'll be offline (mostly) a few days. I can provide more info or prepare a reproducer on next week.
Thanks! Those details are very helpful. I am now seeing a crash that may be related.
Also please, what are you using for sampleRate, channelCount and inputPreset?
Note that setBluetoothScoOn() "should only be used by applications that replace the platform-wide management of audio settings or the main telephony application." Is that true for your app?
https://developer.android.com/reference/android/media/AudioManager#setBluetoothScoOn(boolean)
I have been doing a lot of testing on this and found that if a legacy AAudio stream is not stopped then a data callback, and possibly an error callback, can occur after the stream has been closed and deleted. This is more likely to occur when a routing change is occurring.
If an app is closing and restarting streams from two threads, one Java and one triggered by an error callback, then a race condition in the application could cause this error. Consider the following sequence:
JNI Thread , Error Callback
stop()
close()
open()
stop()
start()
close()
late callback => crash!
The stream is running when the close is called, causing the crash.
To avoid this problem, the app can make sure the stop and close occur together by putting them all under a single lock:
{
scoped lock
stop()
close()
open()
start()
}
The app may wish to do all stream management under a single thread. The Java code and the error callback can send messages to that thread so that everything is coherent. I will provide an example of this. #960
In Oboe, we can force a stop from the close() method under a lock.
Sorry for delay, I'm back.
Also please, what are you using for sampleRate, channelCount and inputPreset?
I'm using device-native parameters:
playback stream: device=30 api=AAudio dir=Output format=I16 channelCount=2 sampleRate=48000
sharingMode=Shared perfMode=LowLatency usage=VoiceCommunication contentType=Speech
inputPreset=VoiceRecognition samplesPerCallback=0
recording stream: device=29 api=AAudio dir=Input format=I16 channelCount=2 sampleRate=48000
sharingMode=Shared perfMode=LowLatency usage=VoiceCommunication contentType=Speech
inputPreset=Unprocessed samplesPerCallback=0
Note that setBluetoothScoOn() "should only be used by applications that replace the platform-wide management of audio settings or the main telephony application." Is that true for your app?
Oops, setBluetoothScoOn() is unneeded here. I removed it and nothing changed. I only called setBluetoothScoOn(false)
after stopBluetoothSco()
, so maybe it had no effect at all.
If an app is closing and restarting streams from two threads, one Java and one triggered by an error callback, then a race condition in the application could cause this error. Consider the following sequence:
Hmm.
Here is how I was closing streams in, for example, third test above:
All manipulations with oboe streams, except data callback, are performed under a mutex. The mutex protects both streams. I always stop and close both streams and only then open two new streams.
Also, it seems that the crash happens before creating and opening new streams. The stream is stopped and closed, then the error callback is called on closed stream, and I didn't try to create a new one yet.
@philburk I've prepared a minimal app reproducing the problem. Although it's rather small, my customer is more comfortable with not making it fully public, so I've pushed it to a private repo on github and sent you an invite. Hope this is OK.
The app has two buttons, start and stop. Start button selects device IDs, starts BT SCO, and creates two oboe streams. Stop button closes and stops oboe streams and stops BT SCO.
The app never destroys oboe streams. This memory leak is intentional. It allows us to print a log message when callback is called after close, instead of crashing. The app still crashes, but not in our code.
Also, the app monitors device manager events and automatically invokes start when a BT SCO device appears and stop when it disappears. Actually, to reproduce the problem, you don't even need to press start and stop buttons manually. They are needed just for convenience.
I can reproduce two variations of the problem using this example app:
This github gist provides the logs for both issues: https://gist.github.com/gavv/5eda6ae760c2eb979c34efb0e64eb272
For callback-after-close issue, search for these lines:
08-01 21:32:25.259 21896 22315 D ExampleApp: got onErrorBeforeClose() on input stream for device 1276 with error ErrorDisconnected
08-01 21:32:25.259 21896 22315 D ExampleApp: !!!!!!!! got error callback on closed stream !!!!!!!!
Here is how to reproduce the problem:
Eventually, the app will crash after step 3 and you'll see a crash and possibly also callback-after-close message in logs.
When I keep the call to randomDelay()
in MainActivity commented out, I get a crash once per 5-10 iterations, but never get callback-after-close. When I uncomment randomDelay()
in MainActivity, I get callback-after-close once per 15-20 iterations.
To build and deploy the app, you can just run build.sh
and deploy.sh
. The build script uses docker to freeze versions of SDK and NDK. The very first build is very slow.
The app comes with clang UB sanitizer enabled. I'm using the debug version of the APK.
Thank you! Being able to reproduce the problem is very important. I have found the invitation and am looking at the repo. I really appreciate this.
Thanks gavv. I am able to reproduce the crash consistently with your test program when running P. I tried many times to crash on the current master and on QQ3A and it did not crash.
I could only reproduce this when using Bluetooth
I then tried calling usleep(N*1000) before the stream is closed. N. crashes:tries 10 => 0:10 5 => 0:10 1 => 1:10 in onDeviceUpdate() no => 8:11
So it is a pretty tight race. It seems a short delay right before the close is enough to prevent the race. Five msec works OK but ten msec should be a safer value. I will add these delays to Oboe.
Thanks! So, presumably, this is indeed a race in AAudio? And we don't know whether it's fixed or just not reproducing on master, right?
Should we report it? I'm not familiar with the source code, but after quick look, I didn't find a place where AudioStreamLegacy waits until onDeviceUpdate is exited and unregistered, before closing the stream.
And we don't know whether it's fixed or just not reproducing on master, right?
Right. That is the important question. I am looking at old bugs and change logs for this. It is being tracked internally as b/161914201
I didn't find a place where AudioStreamLegacy waits until onDeviceUpdate is exited and unregistered, before closing the stream.
It is unregistered in AudioStreamRecord::release_l(). That is only for the error callback. There is also the issue of the data callback. I am looking at refactoring this code.
See PR #970
Related crash on Android 10 (on same device after update): https://gist.github.com/gavv/b0f21f0137271534e072603a1630ce02
Also in onAudioDeviceUpdate(), this time with sanitizers disabled.
Happened after switching BT off/on.
Patch from #970 was NOT applied.
Thanks again for this report. We have identified several sources of late callbacks. One is that AudioTrack only joins it threads in its destructor. We have addressed these problems in R and S. Luckily these conditions can be avoid by forcing a stop() and sleeping for a few msec before the close.
Luckily these conditions can be avoid by forcing a stop() and sleeping for a few msec before the close.
Is this something app is expected to do prior to calling close()? That is, prior to calling close(), are we expected to call stop() and sleep for a few milliseconds? We thought oboe was doing this already.
Thanks again for this report. We have identified several sources of late callbacks. One is that AudioTrack only joins it threads in its destructor. We have addressed these problems in R and S.
Great, thanks!!
Is this something app is expected to do prior to calling close()? That is, prior to calling close(), are we expected to call stop() and sleep for a few milliseconds? We thought oboe was doing this already.
As far as I understand, this is already addressed by oboe in #967. @philburk Right?
Is this something app is expected to do prior to calling close()?
In Oboe 1.5.0, in the close() method we issue a stop and also pause for a few msec. If the stream is already stopped then the extra stop is a NOOP.
b/161914201 has been fixed with three CLs in Android R.
Android version(s): 9 Android device(s): Xiaomi Redmi Note 7 Oboe version: 1.4.2
Hi,
First of all, sorry if I'm reporting this to a wrong tracker. I'm still not 100% sure if this is Oboe bug, AAudio bug, or bug in my own code.
My app uses Oboe with BT SCO headset (AirPods) and creates two oboe streams (input and output). It also subscribes AudioManager events (from Java SDK), and when device disappears, it calls
stream->stop()
, thenstream->close()
, and then destroysstream
; first for one stream, and then for another.(I'm using shared stream with data and error callbacks).
At the same time, concurrently with the second stream close call, onAudioDeviceUpdate() callback is invoked somewhere in background. Then close() returns and I destroy the stream. And then AudioStreamLegacy segfaults in background. Before the segfault, clang UB sanitizer reports integrity check failure.
It seems to me that close() doesn't wait until onAudioDeviceUpdate() is finished, or onAudioDeviceUpdate() doesn't check that the stream is already closed, or it fails if the stream is being closed during its invocation, or it invokes something (error callback?) that is not being waited by close(), etc.
Here are the logs:
Relevant lines are:
And here is the backtrace:
My own code looks sane to me: I just call close() and expect that it will wait everything that should be waited, and when it returns, I destroy the stream.
If my code is correct, I'm not sure whether it's related to Oboe or AAudio. Is there a chance that Oboe close() implementation is incomplete and doesn't wait for something?
Currently the reproducer is a rather large closed-source app. If my expectations to close() are correct and you think I'm using it right, and this may be related to Oboe, I can try to create a stand-alone reproducer. Please let me know if this makes sense.