square / whorlwind

Makes fingerprint encryption a breeze.
Apache License 2.0
815 stars 62 forks source link

Back-to-back calls to read() throws IllegalStateException #1

Open danh32 opened 8 years ago

danh32 commented 8 years ago

I'm seeing some java.lang.IllegalStateException: Already attempting to read another value. being thrown in the wild. I'm able to reproduce locally by rotating my device a couple of times with our FingerprintFragment up. As best as I can tell, there's a race condition between the CancellationSignal notifying the AuthenticationCallbacks which resets the AtomicBoolean and then the Activity/Fragment lifecycle's onPause and onResume.

I can also repro in SampleActivity by hardcoding immediate calls at the end of onResume:

    readSubject.onNext("sample");
    readSubject.onNext("sample");
danh32 commented 8 years ago

If anyone else is hitting this, our workaround is:

whorlwind.read()
    .retry((count, throwable) -> count < 4 && throwable instanceof IllegalStateException)
    ...
    .subscribe(...)
JakeWharton commented 8 years ago

Adding subscribeOn(mainThread()) might also do the trick. This will defer subscription to the next frame and hopefully give the callback time to happen.

On Wed, Mar 23, 2016, 7:56 PM Dan Hill notifications@github.com wrote:

If anyone else is hitting this, our workaround is:

whorlwind.read() .retry((count, throwable) -> count < 4 && throwable instanceof IllegalStateException) ... .subscribe(...)

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/square/whorlwind/issues/1#issuecomment-200587168

danh32 commented 8 years ago

Matt had me give this a shot. Didn't work for me, unfortunately.

JakeWharton commented 8 years ago

Can you use a retained fragment to retain the same Observable instance? We'd like to fix the larger problem, but ideally you remain connected to the OS for fingerprint auth during rotation as well. And if you use observeOn(mainThread()) it defers any notification that comes between onPause and onResume.

On Wed, Mar 23, 2016, 8:14 PM Dan Hill notifications@github.com wrote:

Matt had me give this a shot. Didn't work for me, unfortunately.

— You are receiving this because you commented.

Reply to this email directly or view it on GitHub https://github.com/square/whorlwind/issues/1#issuecomment-200592144

danh32 commented 8 years ago

I definitely can give it a shot. Is there any drawback to the .retry() approach? I'm only seeing the retry trigger a single time in practice, so it seems pretty low overhead. It's also a much less invasive fix for us right now.

mattprecious commented 8 years ago

The repro in the sample does throw the same exception, but that's intentional since it calls read() twice which would trigger the underlying Android bug.

danh32 commented 8 years ago

Shouldn't the .switchMap() unsub from the first call to .read() when it sees the second, though?

mattprecious commented 8 years ago

Ah, yeah. You are correct.

danh32 commented 8 years ago
Subscription s1 = whorlwind.read().subscribe();
s1.unsubscribe();

// how long of a wait here is safe?

Subscription s2 = whorlwind.read().subscribe();
s2.unsubscribe();

The time above is apparently less than the time it takes between Fragment.onPause() and .onResume(). It's fine if the code above shouldn't work as-is, but there's no good way currently to know how long to wait between the two calls. Not sure if there would be a good way to expose this?

JakeWharton commented 8 years ago

The wait is unknown. We trip the cancelation signal which makes a binder call to the OS which has to make a binder call back into the app to signal successful cancelation.

danh32 commented 8 years ago

Right. It's an unfortunate case where you're trying to save us from the platform bug, which is definitely great! This is just a bit of a sharp corner in the api here that I didn't expect. I initially expected that if I had unsubscribed from the previous .read() that it would be safe to make another call to .read(). Due to the asynchronous platform calls that's not the case. The only real issue here is that this duration can be longer than a onPause()/onResume() pair, which can lead to hitting this crash (which again is better than the platform bug we would otherwise be hitting in this case).

I'm not sure what a resolution here would look like. Perhaps just mention this case in the docs somewhere? Trying to delay the second call to FingerprintManager.authenticate() to wait for the system to call the previous AuthenticationCallbacks sounds pretty hairy. Would there be any other way to expose when it's safe again to call .read() again? I'm not convinced that there's a great way to do that. I'm definitely fine with our temp workaround for the short term. Long-term moving to not unsubscribing during the rotation sounds like a better approach.

danh32 commented 8 years ago

I think I've found an interesting point here: due to the platform bug, it's also possible for the lockscreen to steal whorlwind's callback to onAuthenticationError, which means that readerScanning never gets reset to false and every future call to .read() on this Whorlwind instance will throw the IllegalStateException.

Repro steps (can use latest version of Robinhood to repro): 1) Launch app that will call whorlwind.read() in the onResume() path. 2) turn device's screen off before app fully launches 3) fingerprint sensor is now locked and will not wake the device (Android bug) 4) hit power button and enter pin to wake device 5) App's onResume() call to .read() will throw IllegalStateException no matter how long we wait before the .read() call since readerScanning will always be true

edit: I can also repro with whorlwind's sample app by adding readSubject.onNext("sample"); to the end of onResume(). I'm only seeing this bug on a physical device, though, doesn't seem to repro on an emulator.

mattprecious commented 8 years ago

Ugh... that bug is the worst. Does the lockscreen register its listener before your onPause() is called?

Repro steps (can use latest version of Robinhood to repro):

:flag-ca:

danh32 commented 8 years ago

:flag-ca:

Haha, sorry!

Is there a log statement to look for in logcat that you're aware of for when the lockscreen registers? I'm sifting through a log now, but my device is very verbose :(

danh32 commented 8 years ago

Ah, I was actually just able to repro by turning the phone completely off and then running the app from Android Studio - so in that case the lockscreen would be registered before both onResume() and onPause()

danh32 commented 8 years ago

So I think my statements above about the lockscreen hijacking whorlwind's onAuthenticationError call are untrue.

After debugging through more, it seems like this is happening:

1) Lockscreen is up and listening for fingerprints 2) App launches in background calls whorlwind.read() in onResume(), and then immediately unsubscribes in onPause() 3) Whorlwind sets readerScanning 4) Whorlwind calls cancellationSignal.cancel() 5) Whorlwind calls fingerprintManager.authenticate() with an already cancelled signal and never gets any callbacks

alan-rodriguez commented 8 years ago

I can reproduce what @danh32 is seeing with the lock screen just by toggling the power button on the Nexus 6P. Once to turn off the device screen while my Activity is in the foreground, and then a quick two hit toggle about a second afterwards (I have quick launch of the Camera disabled). For whatever reason, this ends up resuming and then immediately pausing my Activity, even though it's still sitting behind the lock screen.

The workaround I came up with was to check that KeyguardManager.inKeyguardRestrictedInputMode() == false before calling Whorlwind.read() in onResume().

mattprecious commented 8 years ago

@alan-rodriguez, are you getting another onResume() call once the lock screen is unlocked?

alan-rodriguez commented 8 years ago

@mattprecious Yes, I do get another onResume() after the lock screen is unlocked and my activity returns to visibility.

But before that, I am seeing an cycle of onStart() -> onResume() -> onPause -> onStart while the screen is powered off and the device is still locked. This is when I hit the device/lock screen button once to power off the screen and then hit the button twice quickly within ~200ms to 800ms. The screen stays powered off, but I see the lifecycle callbacks occurring in logcat.

mattprecious commented 8 years ago

Alright, that's interesting. I was worried that onResume wasn't getting called another time after the phone was unlocked, which would prevent your app from starting up the fingerprint reader at the right time.

It sounds like that's a good solution for that case.

grandstaish commented 8 years ago

I have a pretty hacky idea for working around this if you're interested @mattprecious.

If readerScanning is true, you could authenticate a stub fingerprint callback, cancel it straight away, and consume that single (incorrect) error signal. Once the error is consumed, you can start up the real fingerprint manager authenticate call. That way your real callback will only receive the correct signals.

danh32 commented 8 years ago

I've found a bit more, here. If you subscribe and unsubscribe from the read() stream enough, eventually we'll hit a case where FingerprintAuthOnSubscribe calls cancellationSignal.cancel(), but we never receive a callback to onAuthenticationError to reset the readerScanning flag. I'm assuming this is due to the framework bug, but I'm not sure since there's nothing apparent that would be stealing "our" onAuthenticationError callback. Unfortunately, this behavior leaves the whorlwind instance unusable, since nothing will ever reset the readerScanning flag.

I'm able to get my app to crash by backgrounding and foregrounding it enough, and can get the whorlwind sample to crash by backgrounding, foregrounding and tapping 'read' enough times (6-8, generally).

danh32 commented 8 years ago

crash.txt patch.txt

Here's some logging output that's probably clearer than my description above, and the patch of the logging that was added.

grandstaish commented 8 years ago

@danh32 if you remove the logcat filter for your app, do you see the following logs from the system:

Rejecting your.package.name. ; not in foreground
cancelAuthentication(): reject your package name

Android won't call the fingerprint authentication callbacks if your app is backgrounded, and it seems like there's a race condition with when onPause() gets called and when the system thinks your app is backgrounded. At least that's what I'm seeing and struggling with currently. I see it on my own app as well as the google sample app.

As you say, whenever there is a missed error callback, Whorlwind becomes completely unusable without resetting that flag.

danh32 commented 8 years ago

@grandstaish Yep, I'm seeing those logs. It's interesting, since the code at https://android.googlesource.com/platform/frameworks/base/+/5e35422/services/core/java/com/android/server/fingerprint/FingerprintService.java seems to indicate that it only checks for the app being in the foreground when trying to authenticate, not when trying to cancel. I haven't dug into the history to see if that behavior has changed since the Marshmallow release.

danh32 commented 8 years ago

I'm bad at navigating AOSP, but found here: https://github.com/android/platform_frameworks_base/blob/master/services/core/java/com/android/server/fingerprint/FingerprintService.java#L764

Looks like you're right, race condition as to whether we're backgrounded or not.