keymanapp / keyman

Keyman cross platform input methods system running on Android, iOS, Linux, macOS, Windows and mobile and desktop web
https://keyman.com/
Other
372 stars 102 forks source link

bug(android): getKeyboardInfo with invalid index: -1 #6703

Closed sentry-io[bot] closed 1 year ago

sentry-io[bot] commented 1 year ago

Sentry Issue: KEYMAN-ANDROID-75

getKeyboardInfo with invalid index: -1

    at dalvik.system.VMStack.getThreadStackTrace(VMStack.java)
    at java.lang.Thread.getStackTrace(Thread.java:1736)
    at com.tavultesoft.kmea.util.KMLog.LogError(KMLog.java:39)
    at com.tavultesoft.kmea.data.KeyboardController.getKeyboardInfo(KeyboardController.java:170)
    at com.tavultesoft.kmea.KMManager.getCurrentKeyboardInfo(KMManager.java:1747)
    at com.keyman.android.SystemKeyboard.onStartInput(SystemKeyboard.java:142)
    at android.inputmethodservice.InputMethodService.doStartInput(InputMethodService.java:2433)
    at android.inputmethodservice.InputMethodService$InputMethodImpl.startInput(InputMethodService.java:652)
    at android.inputmethodservice.InputMethodService$InputMethodImpl.dispatchStartInputWithToken(InputMethodService.java:688)
    at android.inputmethodservice.IInputMethodWrapper.executeMessage(IInputMethodWrapper.java:199)
    at com.android.internal.os.HandlerCaller$MyHandler.handleMessage(HandlerCaller.java:44)
    at android.os.Handler.dispatchMessage(Handler.java:106)
    at android.os.Looper.loop(Looper.java:257)
    at android.app.ActivityThread.main(ActivityThread.java:8192)
    at java.lang.reflect.Method.invoke(Method.java)
    at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:626)
    at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1015)
jahorton commented 1 year ago

Reference for part of that call stack:

https://github.com/keymanapp/keyman/blob/4a2cb367ef697b4ce91948eba51f0f7a89dc21a7/android/KMEA/app/src/main/java/com/tavultesoft/kmea/KMManager.java#L1745-L1748

Upon initial digging, it's pretty clear that index = -1 in this case. After all...

https://github.com/keymanapp/keyman/blob/4a2cb367ef697b4ce91948eba51f0f7a89dc21a7/android/KMEA/app/src/main/java/com/tavultesoft/kmea/data/KeyboardController.java#L160-L172

So yeah. Definitely a value of -1. For whatever reason, getCurrentKeyboardInfo is clearly operating under the assumption that a valid index will be returned, but it's not happening.

My current running theory: if there is no current keyboard somehow, null won't show up in the keyboards array and thus results in the index of -1. Whether or not I'm wrong... clearly we need some sort of fallback behavior here, likely to select the first index.

Part of why: for the frame before it in the call-stack...

https://github.com/keymanapp/keyman/blob/4a2cb367ef697b4ce91948eba51f0f7a89dc21a7/android/KMAPro/kMAPro/src/main/java/com/keyman/android/SystemKeyboard.java#L115-L149

So, this is happening when the system keyboard is getting loaded for the first time, during initialization. It's possible that it's simply too early in the process for the active keyboard to have been set. Fortunately... the code asking about the keyboard's info just passively skips the affected operation set. This makes me wonder if we have any issues about the prediction banner acting funny or going missing on the initial load of the system keyboard... but I don't see evidence of such an issue in the repo.

mcdurdin commented 1 year ago

Fortunately... the code asking about the keyboard's info just passively skips the affected operation set.

Except that this crashes the app.

Please see the fix I've already written in #6704.

jahorton commented 1 year ago

Please see the fix I've already written in #6704.

From said fix's description:

Note that this addresses the error at hand but as I am unable to reproduce the issue, does not necessarily address the root problem, so there may still be other issues reported even after this is fixed.

I found myself with some extra time to try looking into the root problem, so I did a little digging. I'd rather find the source if it's easy enough to do so.

jahorton commented 1 year ago

Except that this crashes the app.

Are we sure about this? Granted, the Sentry logs for Android don't exactly make clear when the app or keyboard is truly crashed or if it was a logged error, but...

https://github.com/keymanapp/keyman/blob/4a2cb367ef697b4ce91948eba51f0f7a89dc21a7/android/KMEA/app/src/main/java/com/tavultesoft/kmea/data/KeyboardController.java#L160-L172

Note line 170 - that's quite clearly a logged error, with no exception thrown. Given that there's no pressing issue in the repo or on the Community site we can point to as an analogue, I think it's just that - a logged error. We should know something if it were an egregious crash of the keyboard.

For verification, here's the definition of KMLog.LogError:

https://github.com/keymanapp/keyman/blob/adb6ed94ffb67f3290ada37ec211d50b8493b93a/android/KMEA/app/src/main/java/com/tavultesoft/kmea/util/KMLog.java#L29-L42

jahorton commented 1 year ago

Then again, maybe it does make it obvious. I just didn't notice how until now.

image

Different Sentry log, of course, but I think it's clear that that one is a crash.

jahorton commented 1 year ago

~tl;dr: I believe we have a race condition on our hands.~

(Further investigation, as of https://github.com/keymanapp/keyman/issues/6703#issuecomment-1151864990, has all but disproven this.)


Further investigation - this is the expected codepath for system keyboard's initialization of the initial keyboard:

https://github.com/keymanapp/keyman/blob/adb6ed94ffb67f3290ada37ec211d50b8493b93a/android/KMEA/app/src/main/java/com/tavultesoft/kmea/KMManager.java#L2458-L2481

Note this section in particular:

https://github.com/keymanapp/keyman/blob/adb6ed94ffb67f3290ada37ec211d50b8493b93a/android/KMEA/app/src/main/java/com/tavultesoft/kmea/KMManager.java#L2476-L2481

This code is called whenever the keyboard host page has finished loading.


Note how this, by its nature, is almost certainly an asynchronous callback from the WebView; I know this to be the case for the iOS app. Likewise, the true entrypoint for the system keyboard is onCreate, with onStartInput being called a little later by the system.

So, my guess is that we have two asynchronous methods operating in parallel, with one of them expected to finish before the other. But... is there anything guaranteeing this? Is this a race condition?

jahorton commented 1 year ago

As an aside, an interesting view from the Sentry issue:

image

There are a number of users with single-digit event counts on the following (tabulated) page.

jahorton commented 1 year ago

Continuing the investigation, onCreate calls Manager.initialize(), which in turn calls KeyboardController.initialize(). This call is not wrapped in a conditional, so the call will always chain through. That, in turn, marks the KeyboardController as initialized, as needed here:

https://github.com/keymanapp/keyman/blob/4a2cb367ef697b4ce91948eba51f0f7a89dc21a7/android/KMEA/app/src/main/java/com/tavultesoft/kmea/data/KeyboardController.java#L160-L172

However, that same function... doesn't actually initialize the field for tracking the active keyboard. It certainly loads all data for installed keyboards, I'll give you that. But... there's no call to set the keyboard's currentKeyboard field there. Largely because it's a different class, of course - the KMKeyboard class tracks that field, not KeyboardController.

KeyboardController.initialize() definition, for reference. https://github.com/keymanapp/keyman/blob/adb6ed94ffb67f3290ada37ec211d50b8493b93a/android/KMEA/app/src/main/java/com/tavultesoft/kmea/data/KeyboardController.java#L49-L113
jahorton commented 1 year ago

An important note for any potential reproduction attempts:

Alternatively... just don't set it as the default system keyboard from the app.

  1. With the app terminated...
  2. Go to the device's Settings > System > Languages > Languages & Input > Virtual Keyboard > Manage Keyboards
  3. If Keyman is currently toggled on, toggle it off.
  4. Then toggle it on.

This will ensure that the manufacturer-default keyboard has a globe key for keyboard swapping, allowing you to swap to the Keyman system keyboard. The first time you do so, with the app currently terminated, you'll be able to test how the system keyboard initializes when set to be immediately active.

... not that I've actually achieved a full repro yet; I still haven't reproduced the actual log message when testing against beta. I have been able to delay the host page's init, at least, and that gave me enough info to discern the information above.

That said... there are logs on Sentry that strongly suggest that a few select users have a far more consistent trigger; their installations will produce log messages at or over 3 times a minute sometimes. It's hard to say what that is without further investigation, though I'm pretty sure that the rate above indicates that it's not happening from just fresh-boot loads of the keyboard.

jahorton commented 1 year ago

It's taken a few passes and some serious debug-logging, but I'm starting to make better sense of some of the codepaths involved now...

https://github.com/keymanapp/keyman/blob/adb6ed94ffb67f3290ada37ec211d50b8493b93a/android/KMEA/app/src/main/java/com/tavultesoft/kmea/KMManager.java#L2458-L2469

Line 2471 there is the only line in the codebase that sets SystemKeyboardLoaded = true. This condition is absolutely necessary for a critical condition seen here:

https://github.com/keymanapp/keyman/blob/adb6ed94ffb67f3290ada37ec211d50b8493b93a/android/KMAPro/kMAPro/src/main/java/com/keyman/android/SystemKeyboard.java#L131-L145

Line 142 there...

https://github.com/keymanapp/keyman/blob/adb6ed94ffb67f3290ada37ec211d50b8493b93a/android/KMAPro/kMAPro/src/main/java/com/keyman/android/SystemKeyboard.java#L142

is essentially

} else if (SystemKeyboardLoaded) {

Therefore... the host page must have loaded for us to reach line 145...

https://github.com/keymanapp/keyman/blob/adb6ed94ffb67f3290ada37ec211d50b8493b93a/android/KMAPro/kMAPro/src/main/java/com/keyman/android/SystemKeyboard.java#L145

which is the line seen in the Sentry logs' call stack.

But... is there anything guaranteeing this? Is this a race condition?

Well, I found the "anything": it's that SystemKeyboardLoaded variable. So either we have multiple threads in operation... or there is no race condition after all. Which then re-raises the question of where this -1 might be coming from.

And after a bit more investigation... all lines seen above appear to be executed on the same thread, according to my investigation via Android Studio's debugger. Huh.

jahorton commented 1 year ago

Oh, and other good news - where I'd thought we might have a scenario disrespecting 'privacy mode' for password entry, the same investigation leading to the previous comment's notes has also demonstrated that such a scenario (fortunately) doesn't exist. The active keyboard isn't even referenced when the "prediction override" toggle is set. It just took a number of passes through the codebase to recognize that toggle's related logic for what it was.

jahorton commented 1 year ago

OK, so I've been trying from a new angle. I'm using a couple of recently discovered bugs to try to repro this one.

By using both of these bugs, it's possible to get the app into a state best represented by the following screenshots:

Screen Shot 2022-06-10 at 3 07 53 PM

Screen Shot 2022-06-10 at 3 08 08 PM

I can safely return from Settings without the keyboard resetting.

It is possible to leave English installed (sil_euro_latin) instead of Pinyin (sil_cameroon_azerty), which would actually be an even more pronounced difference.

Even then, for our current scenario... by the way our mobile apps work internally, the displayed keyboard uses a different keyboard-language pairing than the only one currently installed. Therefore, if my understanding is correct, the currently active keyboard will not be found in the app's list of installed keyboards. (Yes, even if it's using the same keyboard ID.) From here, this may take some exploration, but this should be a good foundation, usable somehow to get at least one form of repro for this bug.

Attempts:

It's also possible that I should try deleting both the installed pairings and just leave the default EuroLatin in place, as I can do so while maintaining the Baka installation initially. That said, some of the failsafes in the code attempt to fall back on the default keyboard; I was hoping to get far enough to trip over even those aspects, as it is possible to uninstall the default keyboard from the app.

jahorton commented 1 year ago
  • Using the system menu approach to activate the system keyboard... it loads the lone remaining installation (for Pinyin, as seen in Settings) without issue or notable log message.
    • Likely because there was a semblance of a prior keyboard index, even if technically for a different language pairing.
    • When initializing, it operates by index first. So, it would seem that there is an index value once the state above is reached.

Well... turns out this particular avenue doesn't appear to be viable for a repro attempt.

https://github.com/keymanapp/keyman/blob/adb6ed94ffb67f3290ada37ec211d50b8493b93a/android/KMEA/app/src/main/java/com/tavultesoft/kmea/KMManager.java#L2458

https://github.com/keymanapp/keyman/blob/adb6ed94ffb67f3290ada37ec211d50b8493b93a/android/KMEA/app/src/main/java/com/tavultesoft/kmea/KMManager.java#L2481-L2491

With the repro attempt steps listed above (using other bugs to try reproducing this one), we actually reach line 2490 there; the system defaults back to index 0 when the originally-specified index returned no keyboard info. But, this occurs before any call that would result in this issue's error... ensuring that afterward, a valid index exists.

sentry-io[bot] commented 1 year ago

Sentry issue: KEYMAN-ANDROID-1D7

Compare with:

Sentry Issue: KEYMAN-ANDROID-1D6

getKeyboardIndex failed for key en_sil_euro_latin

Note how the event lists for both issues match nigh-perfectly, including their timestamps. Both are from new log messages added by #6761 to help diagnose this issue.

sentry-io[bot] commented 1 year ago

Sentry issue: KEYMAN-ANDROID-1DK

See also:

Sentry issue: KEYMAN-ANDROID-1DN

Compare their event lists; same notes as with the previous comment.

jahorton commented 1 year ago

So, a minor update, given the logging added in #6761:

image

That's almost an absolutely perfect correlation between the two error-log events.

jahorton commented 1 year ago

For that to happen, it's worth noting that the basic getKeyboardIndex preconditions are clearly met:

https://github.com/keymanapp/keyman/blob/dd1d166e14aa6a6bd09790bc6812ea221b918bf7/android/KMEA/app/src/main/java/com/tavultesoft/kmea/data/KeyboardController.java#L222-L249

Lines 225, 229, and 248 all return -1 if reached, but only the last of the 3 emits the getKeyboardIndex failed ... log / Sentry event. Yet, we're clearly making it past this block:

https://github.com/keymanapp/keyman/blob/dd1d166e14aa6a6bd09790bc6812ea221b918bf7/android/KMEA/app/src/main/java/com/tavultesoft/kmea/data/KeyboardController.java#L222-L230


I point all of the above details out because of how the two Sentry events relate:

https://github.com/keymanapp/keyman/blob/dd1d166e14aa6a6bd09790bc6812ea221b918bf7/android/KMEA/app/src/main/java/com/tavultesoft/kmea/KMManager.java#L2003-L2016

Any of the three lines noted above (that return the -1 value) would meet the condition for this block's line 2005. Yet it's nigh-exclusively the last option that occurs. (I'm not sifting through 3000+ event reports to verify that there's literally no case where the other two lines occur.)


So, what does that tell us?

https://github.com/keymanapp/keyman/blob/dd1d166e14aa6a6bd09790bc6812ea221b918bf7/android/KMEA/app/src/main/java/com/tavultesoft/kmea/data/KeyboardController.java#L224-L227

Passing the block above means that:

  1. The class has initialized.
  2. The class property/field list holds a valid reference - there is a live array.

https://github.com/keymanapp/keyman/blob/dd1d166e14aa6a6bd09790bc6812ea221b918bf7/android/KMEA/app/src/main/java/com/tavultesoft/kmea/data/KeyboardController.java#L234-L244

Again, list has a live, non-null instance - we'd have an exception from line 235 otherwise. But, for whatever reason, the list does not contain (or, "no longer contains"?) the keyboard being requested.

In the logs, I see that this sometimes arises when the app is resumed after being paused, and sometimes occurs when the device "wakes up" after having been locked / "put to sleep".

jahorton commented 1 year ago

Just a thought here, but...

https://github.com/keymanapp/keyman/blob/dd1d166e14aa6a6bd09790bc6812ea221b918bf7/android/KMEA/app/src/main/java/com/tavultesoft/kmea/data/KeyboardController.java#L237

Is it possible that case sensitivity is at play here? This code, as written, does assume that case sensitivity is not a factor - that some form of casing standardization has already been applied. And we've run into situations that arose from a lack of enforcement in other areas. I remember it being a factor in #6733.

Looking into the code a bit, I can find

https://github.com/keymanapp/keyman/blob/dd1d166e14aa6a6bd09790bc6812ea221b918bf7/android/KMEA/app/src/main/java/com/tavultesoft/kmea/packages/PackageProcessor.java#L72-L73

That suggests that the first half of the condition is probably safe. But for the second half... I don't see a clear, definitive answer.

I guess there's also

https://github.com/keymanapp/keyman/blob/dd1d166e14aa6a6bd09790bc6812ea221b918bf7/android/KMEA/app/src/main/java/com/tavultesoft/kmea/data/KeyboardController.java#L238

to look into as well in this regard, though I think that case is sufficiently handled:

https://github.com/keymanapp/keyman/blob/dd1d166e14aa6a6bd09790bc6812ea221b918bf7/android/KMEA/app/src/main/java/com/tavultesoft/kmea/util/BCP47.java#L17-L23

jahorton commented 1 year ago

Following up on the #6732 angle (which was kinda tied to #6733)...

The PR that fixed #6732 - that is, #7239 - had this to say early on in its description:

Currently, if a keyboard package installation involves selecting language(s), the default language tag isn't being standardized (lower-cased). This causes the temporary language list to contain duplicates of the same language (e.g. bkc-latn and bkc-Latn).

So... are we sure that this only ever affected the temporary list and never the permanent one?

Is it possible that after fixing previous case-insensitivity bugs like this... that we never actually re-validated the language list and/or prior keyboard setting? If user settings weren't updated to ensure standardized casing, and thus unstandardized forms of the keyboard ID live on for a select few devices & users... perhaps that could be our underlying cause? It is kinda noteworthy just how few users this bug affects, especially given the frequency at which it affects the users who do experience it.

Worth noticing: the core Keyboard object does not enforce a standardized casing for the keyboard ID:

https://github.com/keymanapp/keyman/blob/dd1d166e14aa6a6bd09790bc6812ea221b918bf7/android/KMEA/app/src/main/java/com/tavultesoft/kmea/data/Keyboard.java#L56-L62


If this line of thinking is correct... maybe just tweaking

https://github.com/keymanapp/keyman/blob/dd1d166e14aa6a6bd09790bc6812ea221b918bf7/android/KMEA/app/src/main/java/com/tavultesoft/kmea/data/KeyboardController.java#L237

to use equalsIgnoreCase() instead of equals() would be enough? We aren't getting outright crashes from this, at least.

Of course, it's probably best to fully patch everything up, lest we have this assumption be invalidated somewhere else in the future again by mistake.

@darcywong00: thoughts?

While I personally am not 100% convinced on this... there's that old, famous maxim about eliminating the impossible:

Once you eliminate the impossible, whatever remains, no matter how improbable, must be the truth.

Arthur Conan Doyle

I can't say that literally everything impossible has been eliminated, but this is the first angle of "possible" I've been able to find for this issue given all the other things we know thus far.

jahorton commented 1 year ago

Found something interesting while analyzing Sentry error reports today:

image

https://sentry.io/share/issue/bed2060d7ba24e44854b8f70078d6229/

We have proof of at least one wild mixed-case scenario, unless my interpretation's off somehow?

Either way, this is clearly within the Keyman for Android app (or engine), and the language ID lasted long enough to be passed off to KMW, as this error came from the Web side.

jahorton commented 1 year ago

From https://github.com/keymanapp/keyman/pull/7415#issuecomment-1272733950

I think it's unlikely that the keyboardid or packageid (aka keyboard and package filenames) would have mixed case [...] but all keyboards and packages hosted on Keyman Cloud will have lower case IDs, and the compiler complains about mixed case for these ids.

... I can't help but notice now, especially given the odd frequency of related error logs talking about sil_euro_latin... from the fv_all package... that we may need to double-check what guards are in place for installations from ad-hoc (user-to-user .kmp transfer) cases.

mcdurdin commented 1 year ago

I'm confused -- why? sil_euro_latin and fv_all are both lower case.

jahorton commented 1 year ago

Because of the language IDs in the package, not the package IDs.

mcdurdin commented 1 year ago

Yes, language ID comparisons should always have been case-insensitive.

darcywong00 commented 1 year ago

I think this can be closed now. getKeyboardInfo correctly returns the invalid index -1 when generating the language list for "Add languages to installed keyboard".

The languages on that list are not currently installed.