jahorton commented 2 months ago

Continues from #10843.

Despite my efforts to avoid it, it appears necessary to retool the event engines to ensure a proper linkage between original event identifier and the corresponding GestureSource that will be generated and updated. The previous "identifier" to "source" mapping style appears to be too "loose" due to asynchronicity - it's time to update to a strategy that can directly use closure-capture mechanics to ensure proper binding.

Extra details: By default, browsers will reuse touch identifiers after they are freed - identifiers aren't unique. If this replacement happens while related events are deferred, we get problems like what can be seen in #10843's user test reports. Thus, it's necessary to establish the link between identifier and source while the identifier is still current for the touchpath represented by the source, then "lock in" that link - something that closure-capturing is well-suited for.

As the actual construction is queued for later, the best way I can see to leverage closure-capture mechanisms to resolve the issue is to creating Promises for the GestureSource synchronously during the actual touchStart handler - not within its closures. Resolution of the Promises can occur within closures, but assignments leveraging the Promise need to be maintained synchronously, as it can then be accessed synchronously within the other events for closure-capture in their handlers. (After all, under extreme rapid-typing situations, it's quite possible the Promise will only be fulfilled long after the fact, but this Promise will always be available, unambiguously, in time with the original events.)

User Testing

TEST_RAPID_TYPING_STABILITY: Using the Keyman for Android app, attempt to reproduce #10592 and #10646.

Set "EuroLatin (SIL)" as the active keyboard.
Rapidly type 10 random keys. Verify that 10 characters are emitted, possibly after a brief wait.
- Don't worry about accuracy - just try to type 10 characters as fast as humanly possible, whatever they are.
- Keep in mind that a space ( ) does count as a character; it's easy to accidentally hit it when not worried about accuracy.
Rapidly type while utilizing functions like the globe key, space bar and predictive texts intermittently. (Refer to https://github.com/keymanapp/keyman/pull/10641#issuecomment-1935882421 if needed.)
Try to rapidly type something specific while being precise.
- Perhaps typing rapidly should not skip keys; just make sure it's something you feel comfortable typing rapidly.
- If any of the letters appear out-of-order or appear to be skipped entirely, FAIL this test.
- Do not fail this test if you suspect you simply fat-fingered a key, hitting a close neighbor instead of the intended key.
FAIL this test if any of the following happens:
- Stuck key-highlighting
- Stuck key-previews
- Stuck subkey menu
- Stuck modipress

TEST_GESTURE_REGRESSIONS: Run the full set of Web's gesture-related regression tests and report back any errors.

For the longterm, it would likely be best to add TEST_RAPID_TYPING_STABILITY to that set of tests.
I assume you have a copy of these tests done up in your preferred format; if not, copy and paste the ones from SUITE_TOUCH_GESTURES here.

keymanapp-test-bot[bot] commented 2 months ago

User Test Results

Test specification and instructions

✅ TEST_RAPID_TYPING_STABILITY (PASSED): Retested this issue with the attached PR build (Keyman 17.0.285-beta-test-10960) in an Android 13 Mobile device and here is my observation: 1. I was able to reproduce the keyboard error message after typing rapidly using the OSK for a long time. (approx. 4 minutes) . 2. However, I was not able to reproduce it If I rapidly typed 10 random keys. (as it mentioned in the test steps). Seems to be working fine. (notes)
🟥 TEST_GESTURE_REGRESSIONS (FAILED): Retested using the 'Test uniminified KeymanWeb' test page on an Android (13) mobile device and here is my observation: 1. Followed the instructions as per in the Suite_Touch_Gestures page. 2. I did not see any issue up to Test_Basic_Modipress_Hold. 3. There is a sluggishness on the keyboard after longpressing % key (step 6) and then releasing it. Also, noticed that the keyboard changes back to the default layer. Seems to be an issue. I have attached the video file for reference. (notes)

Retesting Template

``` @keymanapp-test-bot retest TEST_GESTURE_REGRESSIONS ```

Test Artifacts

Android
Developer
iOS
- Keyman for iOS (simulator image)
- FirstVoices Keyboards for iOS (simulator image)
- TestFlight internal PR build version - 17.0.296 (0.10960.10629)
Keyboards
- Test Keyboards
Web
- KeymanWeb Test Home
Windows

bharanidharanj commented 2 months ago

Test Results

TEST_RAPID_TYPING_STABILITY (FAILED): Tested with the attached PR build (Keyman 17.0.271-beta-test-10960) on an Android 13 Mobile device and here is my observation: 1. Opened the Keyman In-app. 2. Rapidly typing on OSK leads to stuck-key-highlighting issues. 3. Tested the same on an Android 14 / API 34 emulator and I was able to reproduce it.

jahorton commented 2 months ago

I didn't see anything about an Android error this time, which is a plus. Though... that might be due to the more common error that surfaced. It'd surface quickly, so if the Android error never super-consistent to begin with, it may have just "not happened (yet)".

Turns out that I made a fairly simple mistake in one of the event handlers - I forgot that ending touches don't show up in their event's .touches property - only within the .changedTouches property. Our unit-test event simulation didn't meet that aspect of the spec and so failed to catch this detail.

tl;dr: The error I found when I went looking quite naturally made the last key touched act "sticky", among other things. I've fixed it and made it easier for unit tests to catch the same mistake in the future.

@keymanapp-test-bot retest all

bharanidharanj commented 2 months ago

Test Results

TEST_RAPID_TYPING_STABILITY (PASSED): Retested with the updated PR build (Keyman 17.0.285-beta-test-10960) in various Android OS versions and here is my observation: 1. Followed the steps as mentioned in the test description and I did not find any error message in Android emulators (Android 5.0, Android 7.1, Android 9.0 and Android 14.0). 2. I did not find any stuck-key-highlighting or any other issues (eg., stuck key-previews, stuck subkey menu, and stuck modipress) during the process. 3. FYI, I got a Keyboard error message while testing it in an Android 13 Mobile device. Apart from this, this build is working fine.

..Android 5.0 / API 21 emulator

..Android 7.1.1 / API 25 emulator

..Android 9.0 / API 28 emulator

..Android 14.0 / API 34 emulator

..Android 13.0 / Samsung Galaxy A23 Mobile device

sentry-io[bot] commented 2 months ago

Sentry Issue: KEYMAN-WEB-HY

sentry-io[bot] commented 2 months ago

Sentry Issue: KEYMAN-WEB-HX

Same behavior, older build. Not sure why I didn't see this in Sentry earlier today, but at least it's here now...

jahorton commented 2 months ago

About those two Sentry reports: yay, the old error types are dead, and it's something new now - and it's much simpler to resolve and to reason about than the old patterns.

I can also see why the user experience wasn't affected by the errors that arose; the new pattern established in this PR makes it clear (to me) that those errors are triggered from already-completed gestures. Still, it's best if we don't even have errors to report, yeah?

@keymanapp-test-bot retest TEST_RAPID_TYPING_STABILITY

One more time - let's aim for zero Sentry error reports this round.

bharanidharanj commented 2 months ago

Test Results

TEST_RAPID_TYPING_STABILITY (PASSED): Retested this issue with the attached PR build (Keyman 17.0.285-beta-test-10960) in an Android 13 Mobile device and here is my observation: 1. I was able to reproduce the keyboard error message after typing rapidly using the OSK for a long time. (approx. 4 minutes) . 2. However, I was not able to reproduce it If I rapidly typed 10 random keys. (as it mentioned in the test steps). Seems to be working fine.

jahorton commented 2 months ago

[...] after typing rapidly using the OSK for a long time. (approx. 4 minutes) [...]. Seems to be working fine.

Well... if nothing else, that's certainly quite rare, and is a massive improvement on what we previously had.

Looking at Sentry, I'm not seeing any new error reports - the only stuff I see is from the previous user-test run. Nothing new in Web or under Android within the test environment sets.

I do feel like sometimes stuff didn't show up for a little while, so I'll try to check back later in case there's some sort of small delay.

mcdurdin commented 2 months ago

Yes it's an improvement, but a keyboard that crashes every 4 minutes is not exactly stable.

jahorton commented 2 months ago

Yes it's an improvement, but a keyboard that crashes every 4 minutes is not exactly stable.

I agree... though it's hard to know what to address at the moment. Unlike prior user test runs, I'm not seeing any Sentry events to follow up on. Kinda need those to better address things and find the root sources.

I think sometimes error reports might have been withheld after one user test run and "released" somehow on the next? There have been times where I went looking for Sentry events and didn't see them... but after a user-retest, errors that should've been there to begin with were suddenly visible. I'm still not 100% sure on how or why that was happening.

Maybe, maybe rerunning the test for a bit might allow error reports through. This is just a feeling / suspicion though, and isn't a guarantee. It's certainly bugging me that we're getting error toasts that don't send Sentry error reports to match.

bharanidharanj commented 2 months ago

Test Results

TEST_GESTURE_REGRESSIONS (FAILED): Run the Gesture_Regression test on the following browsers and setups, and here are my observations:

Android_Chrome_touch: Tested on an Android device (version 13) using the Chrome browser. The test failed in 7 baseline tests. Android Test failed in Test_Basic_Modepress_Hold Tes_Numeric_From_Shift Test_Delayed_subkey Test_Custom_Multitap_Modifier Test_Alternating_Shift_And_Key Test_Modpress_Multitap_Flic_Preview Test_Flick_During_Modpress

..Test_Alternating_Shift_and_Key

https://github.com/keymanapp/keyman/assets/19683143/fa884651-dac8-4a29-8132-a87bb115726a

Tested on an iPhone 13 mobile device (iOS version 17.4) using the Safari browser, and all tests appear to have passed. A sample screenshot is attached for reference.

..Flick_Locking

Tested on an iPad (9th generation) using the iOS 15.5 simulator with the Safari browser. I observed that certain tests couldn't be performed in the simulator because they require two actions (holding the shift key and then long-pressing another key).

The following tests failed due to the previously mentioned issue. Test_Basic_Modpress Test_Basic_Modepress_Hold Test_Numeric_From_Shift Test_Delayed_subkey Test_Custom_Multitap_Modifier Test_Alternating_Shift_And_Key Test_Modpress_Multitap_Flick_Preview Test_Flick_During_Modpress Test_Flick_Basics Test_Flick_Correction Test_Flick_Locking

jahorton commented 2 months ago

I've been noticing a few gesture things along the lines of the failed test recently too on my personal iPhone... but hadn't realized it was happening that extensively. Interestingly, I'm not seeing some of the issues in iOS Simulator.

Did some digging via TestFlight and found that 17.0.261-beta seems to be comparatively fine, with a number of breaks happening in 17.0.262-beta.

... and after a bit of investigation, for iOS, the stuff I noticed is due to context issues that resolve with #10956. That doesn't exactly address the Android bits noted here, though. (17.0.262 did involve some change to iOS's context handling, so it does track somewhat.) Context resets generally involve layer resets, after all, so gestures involving layer manipulation could definitely be affected by context issues.

jahorton commented 1 month ago

@keymanapp-test-bot retest TEST_GESTURE_REGRESSIONS

I've gone ahead and merged in recent beta changes to this PR. I've also tested a local build of the resulting artifact locally on a (slow) Android test device here - so far, the only issues I have seen appear to be due to lagginess. It appears that aspect is due to certain calculations needed during a modipress.

For any tests that fail, please describe the nature of the test failures for each. Please make it explicitly clear if the reason for failure is that things are happening correctly, but too slowly.

"Certain calculations" - we rely on key layout information from the DOM when noting the 'item' for each ongoing touchpath. To get this for layers currently not visible, we make them visible temporarily, which triggers layout reflows - and that appears to be a major contributing factor to lagginess on the local test device.

..Test_Alternating_Shift_and_Key

https://github.com/keymanapp/keyman/assets/19683143/fa884651-dac8-4a29-8132-a87bb115726a

I see the video, but I'm not clear on the reason this was labeled as a failure. Is it due to each tap taking a while to process, rather than executing quickly? The individual H outputs do have a notable delay between each, and that delay is longer than I'd expect you to be intentionally doing when following the test instructions. It'd help to have that "spelled out" explicitly, as right now, all I can do is make my best guess as to your reasoning.

mcdurdin commented 1 month ago

"Certain calculations" - we rely on key layout information from the DOM when noting the 'item' for each ongoing touchpath. To get this for layers currently not visible, we make them visible temporarily, which triggers layout reflows - and that appears to be a major contributing factor to lagginess on the local test device.

Hmm, could we cache this information when the keyboard is first loaded? (Perhaps being smart enough to think about rotation as well?) This may not be a 17.0-beta change but I'm all for avoiding layout reflows during processing!

jahorton commented 1 month ago

In regard to that last point...

Drilling down the other high-percentage contributors reveals similar figures - a lot of time was spent on layout-reflow. I may want to spend some time seeing if that can be optimized.

Interestingly, I don't see anything even close to this level of lagginess within the iOS app/keyboard; maybe Safari webviews are caching layout calculations in a manner that Chrome webviews aren't?

mcdurdin commented 1 month ago

Drilling down the other high-percentage contributors reveals similar figures - a lot of time was spent on layout-reflow. I may want to spend some time seeing if that can be optimized.

Sounds good to me! Let's make sure we get the other beta PRs merged before you get too deep into this one though.

Interestingly, I don't see anything even close to this level of lagginess within the iOS app/keyboard; maybe Safari webviews are caching layout calculations in a manner that Chrome webviews aren't?

Potentially -- but also iPhones are much faster hardware so it may just be much less visible.

jahorton commented 1 month ago

"Certain calculations" - we rely on key layout information from the DOM when noting the 'item' for each ongoing touchpath. To get this for layers currently not visible, we make them visible temporarily, which triggers layout reflows - and that appears to be a major contributing factor to lagginess on the local test device.

Hmm, could we cache this information when the keyboard is first loaded? (Perhaps being smart enough to think about rotation as well?) This may not be a 17.0-beta change but I'm all for avoiding layout reflows during processing!

I could see caching each layer's layout info as it's loaded. That would be well within reason. (Then dumping the cache on a rotation or keyboard size-shift.)

jahorton commented 1 month ago

Drilling down the other high-percentage contributors reveals similar figures - a lot of time was spent on layout-reflow. I may want to spend some time seeing if that can be optimized.

Sounds good to me! Let's make sure we get the other beta PRs merged before you get too deep into this one though.

Agreed. If this is the only real source of problems at present, that can certainly be done separately, allowing this chain to merge.

mcdurdin commented 1 month ago

Just wondering: can we not calculate the regions without doing layout at all? We have all the numbers in the touch layout data and we have the bounding box size?

jahorton commented 1 month ago

Just wondering: can we not calculate the regions without doing layout at all? We have all the numbers in the touch layout data and we have the bounding box size?

At the very least, we can approximate it with that data - and we might even be able to perfectly match it. Sounds like a wise optimization path to investigate.

jahorton commented 1 month ago

I'm not sure why KMW is 3KB larger -- there's not that much new code?

This is 6 PRs in on the 🪠 chain. This is the one that pushes things over on the filesize warning, I guess.

bharanidharanj commented 1 month ago

Test Results

TEST_GESTURE_REGRESSIONS (FAILED): Retested using the 'Test uniminified KeymanWeb' test page on an Android (13) mobile device and here is my observation: 1. Followed the instructions as per in the Suite_Touch_Gestures page. 2. I did not see any issue up to Test_Basic_Modipress_Hold. 3. There is a sluggishness on the keyboard after longpressing % key (step 6) and then releasing it. Also, noticed that the keyboard changes back to the default layer. Seems to be an issue. I have attached the video file for reference.

https://github.com/keymanapp/keyman/assets/19683143/8b03ab00-1618-4ca8-bdf2-d00d90578dbb

jahorton commented 1 month ago

The remaining issues seem to mostly be performance-related. There is one notable, though ambiguous, functionality point of failure... but this does resolve a lot of far, far worse behaviors and will let us get beta to a better state. As such, @mcdurdin and I have made the decision to go ahead and merge this PR and its predecessors. (The next one in line is still pending review.)

keyman-server commented 1 month ago

Changes in this pull request will be available for download in Keyman version 17.0.301-beta

keymanapp / keyman

fix(web): proper linkage of sources to events 🪠 #10960

User Testing

User Test Results

Test Artifacts

Test Results

Test Results

Test Results

Test Results

Test Results