keymanapp / keyman

Keyman cross platform input methods system running on Android, iOS, Linux, macOS, Windows and mobile and desktop web
https://keyman.com/
Other
372 stars 102 forks source link

bug(web): TypeError: Cannot read property 'layout' of null #6898

Closed sentry-io[bot] closed 10 months ago

sentry-io[bot] commented 1 year ago

Sentry Issue: KEYMAN-WEB-J

TypeError: Cannot read property 'layout' of null
  at handleClick (source/osk/preProcessor.ts:66:22)
  at raiseKeyEvent (source/osk/preProcessor.ts:44:22)
  at modelKeyClick (source/osk/visualKeyboard.ts:977:7)
  at inputEndHandler (source/osk/visualKeyboard.ts:727:11)
  at onInputEnd (source/osk/inputEventEngine.ts:55:9)
...
(3 additional frame(s) were not displayed)

Note that this error only happens within Keyman for Android. However it is an error raised by Keyman Engine for Web.

jahorton commented 1 year ago

Looking into this bug, it would appear that this arises with KMW believes that there is no active keyboard:

image

https://github.com/keymanapp/keyman/blob/94214c020b99b1b7f4df900c709c5f5293d497a1/common/core/web/input-processor/src/text/inputProcessor.ts#L270

(Sentry likes to omit our internal libraries in its default reporting view.)

Upon inspecting the relevant call path... I see that it's entirely possible to reach this point with a null or undefined value for activeKeyboard.

I wonder if this is related to #6703 in any manner?

That said, there's clearly something 'funny' going on here... as to get the error above, the user had to interact with an OSK. Which implies there should've been an active keyboard at the time of the keypress.

jahorton commented 1 year ago

Or possibly #6978, given its noted repro? (But via globe key, rather than shift)

mcdurdin commented 1 year ago

Please include text rather than screenshot of stack trace (so we can search it in future!)

TypeError: Cannot read properties of null (reading 'layout')
  at buildAlternates(node_modules/@keymanapp/input-processor/src/text/inputProcessor.ts:270:30)
  at processKeyEvent(node_modules/@keymanapp/input-processor/src/text/inputProcessor.ts:171:56)
  at handleClick(source/osk/preProcessor.ts:66:22)
  at raiseKeyEvent(source/osk/preProcessor.ts:44:22)
  at modelKeyClick(source/osk/visualKeyboard.ts:977:7)
  at inputEndHandler(source/osk/visualKeyboard.ts:727:11)
  at onInputEnd(source/osk/inputEventEngine.ts:55:9)
  at b.onTouchEnd(source/osk/touchEventEngine.ts:87:7)
jahorton commented 1 year ago

Not that it's terribly informative, but there's one interesting thing I just noticed when looking through the Sentry event logs.

This is happening down the chain of calls for an OSK event listener. As such, Sentry is logging that listener's event object for us, yielding data like this, as seen in some of the events:

[
  {
    currentTarget: div.phone.android.kmw-osk-frame > div.phone.kmw-osk-inner-frame.kmw-keyboard-,
    isTrusted: [object TouchEvent], 
    target: div.kmw-key-layer.kmw-5rows > div.kmw-key-row > div.kmw-key-square, 
    type: touchstart
  }
]

I'd like to highlight that target property - note the CSS class .kmw-5rows specified there.

For contrast, some of the other Sentry events log something like this instead:

[
  {
    currentTarget: div.phone.android.kmw-osk-frame > div.phone.kmw-osk-inner-frame.kmw-keyboard-, 
    isTrusted: [object TouchEvent], 
    target: div.kmw-key-row > div.kmw-key-square > div#default-K_I.kmw-key.kmw-key-default, 
    type: touchend
  }
]

Note the lack of .kmw-5rows.


What this tells us: this error is happening for multiple different active keyboards, with a fully-loaded OSK at the time that the event was received.

Why the activeKeyboard (or, at least, its layout) is going null during fat-finger generation is a separate question entirely.

jahorton commented 1 year ago

Also, I have not had any success reproducing this by trying something similar to https://github.com/keymanapp/keyman/issues/6978#issuecomment-1263043835.

sentry-io[bot] commented 1 year ago

Sentry issue: KEYMAN-WEB-H

jahorton commented 1 year ago

Of interest, for what it's worth:

Using the "Predictive Text: robust testing" test page in Chrome...

  1. Open the JS Developer console
  2. Triple-vertical-dots > More tools > Rendering
  3. Rendering > Emulate a focused page
  4. Click the "Type here" text box to display the OSK.
  5. In the JS console: keyman.core.activeKeyboard = null (the OSK remains visible, but predictions will likely disappear)
  6. Hit the OSK's 't' key.

Result:

inputProcessor.ts:270 Uncaught TypeError: Cannot read properties of null (reading 'layout')
    at InputProcessor.buildAlternates (inputProcessor.ts:270:50)
    at InputProcessor.processKeyEvent (inputProcessor.ts:171:61)
    at PreProcessor.handleClick (preProcessor.ts:66:34)
    at PreProcessor.raiseKeyEvent (preProcessor.ts:44:35)
    at VisualKeyboard.modelKeyClick (visualKeyboard.ts:977:20)
    at VisualKeyboard.release (visualKeyboard.ts:727:16)
    at InputEventEngine.onInputEnd (inputEventEngine.ts:55:21)
    at TouchEventEngine.onTouchEnd (touchEventEngine.ts:87:12)

Of note: if done on a keyboard without active predictive text...

kbdInterface.ts:1113 Uncaught TypeError: Cannot set properties of null (setting 'variableStores')
    at KeyboardInterface.applyVariableStores (kbdInterface.ts:1113:41)
    at RuleBehavior.finalize (ruleBehavior.ts:86:35)
    at keyman_4.text.RuleBehavior.finalize (domOverrides.ts:15:34)
    at InputProcessor.processKeyEvent (inputProcessor.ts:175:22)
    at PreProcessor.handleClick (preProcessor.ts:66:34)
    at PreProcessor.raiseKeyEvent (preProcessor.ts:44:35)
    at VisualKeyboard.modelKeyClick (visualKeyboard.ts:977:20)
    at VisualKeyboard.release (visualKeyboard.ts:727:16)
    at InputEventEngine.onInputEnd (inputEventEngine.ts:55:21)
    at TouchEventEngine.onTouchEnd (touchEventEngine.ts:87:12)

(Hence the Sentry issue I just linked above.)

jahorton commented 1 year ago

Of course, the question is what might have put KMW into such a state within the Android Keyman engine.

jahorton commented 1 year ago

So, I had an idea and investigated keyboard-loading Sentry reports for Keyman for Android; this has resulted in #7412. After all, failure to load a keyboard could possibly put KMW into a null-keyboard state.

Unfortunately, I do not believe #7412 is perfectly correlated, or even sufficient to explain all instances of this bug report. To wit:

7412 event reports:

image

For this Sentry event:

image

I don't have to go hunting to merge extra events to realize that the numbers for #7412 are markedly lower than for this issue's bug. Even if this issue's bug were triggered multiple times by the one from #7412... note the user counts. That is the definitive proof that #7412 cannot provide the full picture - this issue's bug has affected over four times as many users!

jahorton commented 1 year ago

The only spot in KMW code I can find that sets the activeKeyboard to null is here, during keyboard loading:

https://github.com/keymanapp/keyman/blob/0191d9a3ac81c5da9a3fa44eeb617fdcdb4381e3/web/source/keyboards/kmwkeyboards.ts#L583-L584

It's evaluated when we're dynamically loading the script for a not-yet-loaded keyboard.


Since I've neglected to point out why I'm chasing this point up 'til now...

inputProcessor.ts:270 Uncaught TypeError: Cannot read properties of null (reading 'layout')
    at InputProcessor.buildAlternates (inputProcessor.ts:270:50)
    at InputProcessor.processKeyEvent (inputProcessor.ts:171:61)
    at PreProcessor.handleClick (preProcessor.ts:66:34)
    at PreProcessor.raiseKeyEvent (preProcessor.ts:44:35)
    at VisualKeyboard.modelKeyClick (visualKeyboard.ts:977:20)
    at VisualKeyboard.release (visualKeyboard.ts:727:16)
    at InputEventEngine.onInputEnd (inputEventEngine.ts:55:21)
    at TouchEventEngine.onTouchEnd (touchEventEngine.ts:87:12)

Compare with:

https://github.com/keymanapp/keyman/blob/0191d9a3ac81c5da9a3fa44eeb617fdcdb4381e3/common/web/input-processor/src/text/inputProcessor.ts#L270-L271

There's... not really any other way to read this error. this.activeKeyboard is null when this occurs, and that should be impossible outside of keystrokes concurrent with globe-key-tap keyboard swapping... and I couldn't get even an inkling of a repro trying out that angle yesterday.

Likewise, for the other Sentry issue linked to this one:

https://github.com/keymanapp/keyman/blob/94214c020b99b1b7f4df900c709c5f5293d497a1/common/core/web/keyboard-processor/src/text/kbdInterface.ts#L1073-L1075

Same reasoning.

jahorton commented 1 year ago

Via a temporary local build, I was able to verify that the mobile app will not show an OSK if it is never able to successfully load a keyboard (so, if none ever succeeds). There must have been an active OSK at some point in the keyboard's lifecycle to get the callstacks we're seeing.

jahorton commented 1 year ago

I've managed to trigger a set of three at one point when doing mashes while quick-swapping the keyboard (via globe-key taps, 2 keyboards installed), but nothing consistent enough to call a repro.

https://sentry.io/organizations/keyman/issues/2776546156/events/ecfa376e541948239280ce41ee41dbca/ (+ the two previous)

Minor, possibly-relevant detail: it was not the first swap after the app was loaded.

At any rate, it occurred on the test device either during or immediately after a series of keyboard swaps.

jahorton commented 1 year ago

Yeah. Definitely not a reliable repro; I haven't seen it again since. I did get a few repros for other issues / events related to other issues, though.

I'm starting to wonder if it's a matter of filesystem lag during a keyboard load. Like, if a load goes "more asynchronous" than usual when trying to load a script tag as part of a keyboard swap, that'd leave a period where activeKeyboard is set null and this could occur, and where the OSK could still be visible. I could probably gerry-rig a version to simulate such an effect... then see if it'll be a more reliable repro than anything else I'm getting thus far.

jahorton commented 1 year ago

Easiest way to make this far more consistent: induce an artificial delay within KMW when keyboard swapping. The following event was generated from such a build:

https://sentry.io/organizations/keyman/issues/2776546156/events/7eeba213291849c6a0c9613978ef203c/

That said, this will only work for the first such quick-swap to new keyboards. This can be reset by longpressing globe and using the picker menu, though, and think I remember accidentally doing that a time or two during the "mash" session that led to the previous comment.

Here's how things look with the artificial repro:

image

(Occurred after the 'quick swap' globe key tap, then pressing 't' on the observed keyboard layout [before the swap completes].)

It should be obvious that sil_euro_latin is not the actually-active keyboard; that's basic_kbdfr. The Android side assumes synchronous keyboard swaps, despite there being a little bit of internal asynchronicity with the file loads. I just made that asynchronicity a lot worse in this artificial repro. As the OSK is forced visible in embedded-mode, it's not allowed to hide while the swap is still happening, so... the old layout stays visible and interactive until the swap completes.

Typing any output key while in this state will generate either this issue's original error or the other error from https://github.com/keymanapp/keyman/issues/6898#issuecomment-1267989657. It's a web-internal issue.


The question then is how best to handle these scenarios. The old keyboard lingers on touch devices because we force-set the OSK to be permanently visible once a keyboard has properly been loaded. For the common case, this is good - there's less "flashing" in and out as a result of that decision. But... if we do find ourselves in an intermediate state... this error may arise.

I'm not sure if it's actually a priority to resolve, as this appears not to affect any common-case user scenarios - at least not severely. I mean, if we're using keyboard quick-swapping, it auto-resolves after the first swap - so even if a user just has two keyboards installed and does a lot of code-switching, there's only a bit of an initial "stumble" before everything feels stable. (Which is probably why we haven't received any corresponding user reports of the issue.)

The cleanest possible way forward I can think of is to pre-load whatever keyboard is 'next' in the quick-swap position. Preloading just one shouldn't come with too high of a memory footprint, and it's easy enough to let that happen in the background asynchronously. Slightly less clean would be to disable all interactivity during the swap.


All of this said, if I've properly discerned the cause of the issue... there's the matter that in theory, we probably should see this on iOS as well, even if only in the context of the system keyboard.

jahorton commented 1 year ago

I can easily provide a branch for the artificial repro if desired. It would affect the full Web, Android, and iOS set.

mcdurdin commented 1 year ago

Good analysis. Preloading may be a simple enough mitigation -- although possibly not desirable for very complex keyboards such as Vietnamese Telex or Hieroglyphics.

I can easily provide a branch for the artificial repro if desired. It would affect the full Web, Android, and iOS set.

Sounds good to me -- mark it as "DO NOT MERGE" and give it a "test:" prefix.

jahorton commented 1 year ago

Good analysis. Preloading may be a simple enough mitigation -- although possibly not desirable for very complex keyboards such as Vietnamese Telex or Hieroglyphics.

I can easily provide a branch for the artificial repro if desired. It would affect the full Web, Android, and iOS set.

Sounds good to me -- mark it as "DO NOT MERGE" and give it a "test:" prefix.

See #7426.

jahorton commented 1 year ago

So, coming back to this with a fresh perspective, I had an idea. It's a bit of a pivot, but I believe it'll handle the majority of the error reports we're getting. See #7543 for the details.

I'm pretty sure it's a better solution than the previous suggestion would have been, too. Things would get weird if a key on keyboard 1 didn't exist on the default layer of keyboard 2, preloaded or not.

jahorton commented 1 year ago

Note: the changes of #7543 do nothing to handle any similar cases for key events from physical keyboards - neither in 'native' nor in 'embedded' mode. Those would need more specialized code to handle correctly.

I'd say we can probably consider this fixed once hardware keystrokes are handled well by non-embedded use cases.

jahorton commented 1 year ago

From a comment on that PR:

LGTM I think.

This is a fairly typical issue seen when using non-functional patterns where functions are dependent on external state.

So I guess a deeper fix would be to always use keyEvent.srcKeyboard throughout keystroke processing and not use this.activeKeyboard anywhere in the processKeyEvent function call or its children.

Note also that in the present implementation, physical keystrokes do not set keyEvent.srcKeyboard, as we cannot currently build them while making the guarantees the field's documentation advertises.

mcdurdin commented 1 year ago

I'd say we can probably consider this fixed once hardware keystrokes are handled well by non-embedded use cases.

I think it'd be good to split handling of hardware keystrokes into a separate issue, given it is likely to be an infrequent one, and mark it for future resolution, and then resolve this and the corresponding Sentry issues. What do you think?

jahorton commented 1 year ago

and the corresponding Sentry issues.

While this probably resolves the first (fat-finger) one (since it doesn't trigger on desktop devices), the other Sentry issue may not be reasonably resolved.

given it is likely to be an infrequent one

Infrequent != resolved, sadly.

I think it'd be good to split handling of hardware keystrokes into a separate issue

That part's pretty reasonable, though.

jahorton commented 10 months ago

Huh. So, for what it's worth...

image

Over the past 3 months, this has only reported for KMW 15.0 and before. There are no instances of the error having been generated from KMW 16.0 (or the 17.0 alpha)!

So... this may actually be safe to close?