bug(android): modipress is not consistently releasing

mcdurdin commented 3 months ago

I sometimes find modipress does not release the modifier key. Repro I think: start new text field and type a letter. Backspace then press shift to return to default layer. Modipress a letter. Shift will not release.

17.0.264-alpha

In video below, 0:06 marks where the modipress starts.

https://github.com/keymanapp/keyman/assets/4498365/f537dd9a-bb84-4f53-aaf1-cacf51158643

jahorton commented 2 months ago

Huh. While investigating #10592, I happened to land a stuck subkey menu. Not quite the same as a stuck modipress, but the underlying principle should be the same.

I doubt what I did will be consistent, though.

jahorton commented 2 months ago

I'm curious if the PR chain ending with #10843 might fix the underlying cause of this issue. I'd add a user test for it... if the repro described above were consistent. I'm not getting a repro in iOS (which I'd expect), though it may also be worth noting that the rapid-typing issues were first noted for Android and also appeared to occur there more prominently. I'm also not seeing it (via a quick check, admittedly) on the in-office test device running #10843 on Android, for whatever that may be worth.

jahorton commented 1 month ago

Now that the rapid-typing fixes have been added, have you continued to see this issue? I haven't run into it once in testing since, and I don't believe Bharani's referred to anything similar in any recent user-testing reports.

mcdurdin commented 1 month ago

I can still see this on 17.0.306-beta. Looks like it might be related to timeout because if I do it slowly, it works correctly, but if I unshift, then modipress rapidly, it does not release shift afterwards.

In the video below, the first modipress has the expected outcome -- back to default layer. The second modipress is a little faster, and has the unexpected outcome -- stays on Shift layer.

https://github.com/keymanapp/keyman/assets/4498365/a7f5ed53-bc48-4240-a155-835d54729aaf

Passing the baton back to @jahorton 😉

jahorton commented 1 month ago

I can still see this on 17.0.306-beta. Looks like it might be related to timeout because if I do it slowly, it works correctly, but if I unshift, then modipress rapidly, it does not release shift afterwards.

In the video below, the first modipress has the expected outcome -- back to default layer. The second modipress is a little faster, and has the unexpected outcome -- stays on Shift layer. Screen_Recording_20240418_104457_Keyman.Beta.mp4

Passing the baton back to @jahorton 😉

Pay careful attention to that last one. You multitapped the shift key - I saw caps lock displayed at 0:08 in the timeline, and it looks like that was the modipressed layer. Since the multitap started from the shift layer, after clearing prior context, it did properly return to the shift layer. All of that behavior checks out.

Was the layer locked afterward, or was it that the return to the shift layer was unexpected?

screencap of active caps-lock layer

jahorton commented 1 month ago

This does make me wonder if we need to design something for cases like this in some way - as a user, it felt like "return to default, then start from there" while the system thought "they multitapped on the shift-layer's shift key".

mcdurdin commented 1 month ago

I see the caps layer too. Not sure if that was what I was originally seeing. The double-tap timeout may be too long, relates to #11183?

mcdurdin commented 1 month ago

Did some more testing, and I think the multi-tap timeout is definitely too long, and reducing it should mitigate this and #11183. We should do some research on double-tap timings.

mcdurdin commented 3 weeks ago

I will retest with 17.0.310-beta and hopefully be able to close after that.

mcdurdin commented 3 weeks ago

Testing with 17.0.313-beta, no change. I can still trigger the same state as in https://github.com/keymanapp/keyman/issues/10753#issuecomment-2062936553. At the end of this, the keyboard is in the Shift layer, not Caps -- typing another key returns to default layer.

The thing is, I can't see a scenario where the layer should remain in Shift state. I did the following keystrokes, reasonably rapidly:

X bksp shift (to return to base) shift+J

At this point, I see J on the screen, and the keyboard is displaying the Shift layer. As far as I can tell, the keyboard should be in the default layer, and the Caps layer multitap should not be involved in this sequence, either, because the second Shift is part of a chord, not a standalone tap.

https://github.com/keymanapp/keyman/assets/4498365/f62a2881-d251-44fb-9632-6fcc2295c917

Note: when I playback slowly, I can see the Caps indicator flash briefly. This seems like it shouldn't be happening.

jahorton commented 3 weeks ago

With that order of input, the gesture engine sees the second shift before the J, and it chords the second shift with the first, saying "multitap." There's no time-delay chording within the engine. If there's already something that exists for an incoming tap to be paired/chorded with, it uses that - and the prior tap was considered recent enough. (The second tap did follow pretty quickly after the first.)

jahorton commented 3 weeks ago

To be clear, I'm simply stating how the gesture engine is and how it currently treats the input. I don't currently see a "quick and easy" way to change this aspect of it, so it's not something reasonably changeable in time for release.

jahorton commented 3 weeks ago

Also, remembering the TWo-caps problem (#7173)... to me, conceptually, our anti-TWo caps handling and time-delay chording don't seem to be very compatible concepts for a generalized engine like Keyman. It would absolutely need design work.

mcdurdin commented 3 weeks ago

(nomenclature: a chord is two or more keys down simultaneously -- generally k1-down, k2-down, k2-up, k1-up for coherence -- which can't be done with multitap :grin:)

Understand that this cannot be fixed at this point in time.

Noting that this does not end up in caps lock state, however, so still seems strange!

mcdurdin commented 3 weeks ago

to me, conceptually, our anti-TWo caps handling and time-delay chording don't seem to be very compatible concepts for a generalized engine like Keyman.

Can you explain further?

jahorton commented 3 weeks ago

to me, conceptually, our anti-TWo caps handling and time-delay chording don't seem to be very compatible concepts for a generalized engine like Keyman.

Can you explain further?

The root of the TWo-caps problem was when two different keys were pressed near-simultaneously, with the first key triggering a layer-swap. The two taps didn't even have to "truly" simultaneous: if the second tap was received while the original layer was still visible, JS would report that the original layer's key is that one that was tapped, even if the key-event was only received after the active layer had been changed.

A lot of complexity went into the gesture engine to ensure that we process that second key separately, even inducing artificial delays:

If a text-outputting key had previously been held down before the first tap and was auto-completed by the new incoming (first) tap, that prior key needs to finish all standard gesture and event processing before we commit to any further decisions about the incoming key.
The newly-incoming first keypress should trigger all standard gesture and event processing.
When starting to process the second key, we repeat step 1 above. This can result in the needed layer-change.
After all of that is resolved, then we start processing the next key.

We're taking deliberate steps to ensure that the second key is not treated as simultaneous with the first, even if it actually was, because the second tap's layer should depend on changes in layer due to the first tap.

The same mechanism used in step 1 to auto-resolve a prior tap on a new keypress uses the same mechanism intended for "chording" multiple touchpoints, just with slightly different settings. The same mechanism is also used to "continue" a multitap if a prior tap was recent enough.

On receiving a new incoming tap:

Does this match any existing gesture? If so, process the results.
If there is a matched existing gesture, and it elects to "own" the new touchpoint, stop here.
- So, if a potential multitap receives a new tap, it now "owns" the new tap and uses it to sustain the multitap.
Else, if there is a matched existing gesture, but it doesn't elect to "own" the new one, finish all processing of the existing gesture before continuing.
If step 3 triggered, adjust all properties of the incoming tap (to avoid TWo caps).
Start processing for a new gesture based upon the new tap.
- So, for your scenario above, we can't start a modipress based on the layer of the "new" tap; we got stopped at step 2.

From there, when you go to tap the J, that aborts further multitapping... but since the multitap was based on a layer-switch key, that just locks in "modipress" mode. Note that the base layer of the multitap-modipress is "shift", matching the layer for the first tap of the SHIFT key. Furthermore, input has been received before the modipress timer has elapsed; this is one of the two conditions that trigger the "return to original layer" flag, which is why you don't end up on the caps lock layer when you release SHIFT.

So, about "time-delay chording": to me, a necessary pre-condition for time-delay chording like what you're intuitively expecting here is that we don't use a greedy, first-come first-served approach. You're asking to add ambiguity at step 2 and not commit to matching the second shift-tap to the prior one in case a third tap comes in within a short timeframe. The last thing I want to add within that chain above is further ambiguity.

It's... technically possible to set up special gesture-models that could still make this work without adjusting that set of steps, but it'd require additional gesture model components in order to handle the ambiguity and model things correctly. Also, it took a while to get the current gesture-models "just right" as they currently are. Furthermore, the multitap-modipress gesture model and its components probably are the most complex gesture specs we currently have operating within the engine - and we'd be adding even more complexity to that chain.

jahorton commented 3 weeks ago

It's pretty hard to distinguish "change base layer" + a quick "single-tap layer change modipress" + quick tap from a "double-tap layer change modipress" + quick tap.

Consider the following input pattern using sil_euro_latin:

From either the default or shift layer, triple-tap the numeric key, holding it down on the final tap.
Quickly tap the key in the W position: the € (euro) symbol.
Quickly release the modipress.

Perform that quickly; it's fundamentally the same action, so far as the engine can tell. Also note that it doesn't return to the numeric or symbolic layer - it returns to your original layer.

If you add enough of a delay between the first and second taps, you'll instead get a 2 and return to the default layer:

layer swap: numeric
Multitap times out
Second tap (from numeric layer): layer swaps back to default. This key has no defined multitaps.
Third tap (held, from the default layer): layer swaps to numeric again. There's no legal multitap to join here.
Fourth tap (W-position key): emits a 2
Third tap lifted: returns to original layer of modipress, which was default.

Tap anything else between the first and second taps, and you'll get the same effect. Either "decent wait" or "tap something else" is enough to prevent treatment as a multitap... and we did just reduce the length for "decent wait" with #11246.

mcdurdin commented 3 weeks ago

So, about "time-delay chording": to me, a necessary pre-condition for time-delay chording like what you're intuitively expecting here is that we don't use a greedy, first-come first-served approach. You're asking to add ambiguity at step 2 and not commit to matching the second shift-tap to the prior one in case a third tap comes in within a short timeframe. The last thing I want to add within that chain above is further ambiguity.

Not quite. I am saying that while the second tap is still down, then we don't resolve.

jahorton commented 3 weeks ago

So, about "time-delay chording": to me, a necessary pre-condition for time-delay chording like what you're intuitively expecting here is that we don't use a greedy, first-come first-served approach. You're asking to add ambiguity at step 2 and not commit to matching the second shift-tap to the prior one in case a third tap comes in within a short timeframe. The last thing I want to add within that chain above is further ambiguity.

Not quite. I am saying that while the second tap is still down, then we don't resolve.

As in... only add it to the multitap once the (second) tap is lifted (key-up), not on key-down?

The current model uses key-down in part because key-down is what activates the layer-change for modifier keys. Either it's part of the multitap, continuing it (hence 'caps' layer) or it's not (in which case, we have a new tap on the default layer's shift key, putting us in the shift layer.) I'm a bit wary of causing a sort of Schrodinger's "caps" situation (pardon the pun) here - we have an ambiguous layer state if we stay unresolved / undecided at this point.

If we then delay until key-up... which layer is the user tapping on the J key press? (And I mean both implied questions: "which visible layer are they tapping?" and "which layer does the engine use for the keypress?") We still haven't lifted the modifier-pressing key.

In case it's not otherwise clear, modipress isn't treated as a single chorded gesture internally. Its design treats it as a stack-pushing gesture that puts the engine into a new state for any new gestures that start during its lifetime. So, each new tap that starts during the modipress's lifetime is its own, fully separate gesture (associated with said "new state"). Going numeric -> 1234 from the default layer involves five total gestures as a result: one for the lifetime of the numeric keypress, then one gesture each for 1, 2, 3, and 4.

mcdurdin commented 3 weeks ago

As in... only add it to the multitap once the (second) tap is lifted (key-up), not on key-down?

Well yes. A tap is a short (time) gesture, and the same for multitap, so resolving it on key-up makes sense to me.

I agree there are definitely many complexities and potential ambiguities here, and I don't think we are going to be able to solve them right now.

However, I am still puzzled as to why we finish up on the Shift layer in my example above. That still seems just wrong. As far as I can tell, we should be finishing either on the Caps layer (because the multitap took effect), or the default layer.

jahorton commented 3 weeks ago

However, I am still puzzled as to why we finish up on the Shift layer in my example above. That still seems just wrong. As far as I can tell, we should be finishing either on the Caps layer (because the multitap took effect), or the default layer.

Per the "modipress" spec, as I understand it: if the user taps an output key while the modifier key is still held down, we revert the layer when the modifier key is lifted (regardless of time duration involved). Alternatively, if they've held it down for a sufficiently-long time without pressing any output keys, we revert the layer then too.

Yes, the multitap took effect, but it was a combined multitap+modipress - so it adheres to that part of the modipress spec. The big difference here is that the "original layer" for the combined form is the original layer that was initially tapped (for the first tap of the multitap), not the layer displayed at the time of the final tap of the multitap (for its final, held-down tap).

The reason I gave the sil_euro_latin example regarding use of the multitappable numeric key is to point out the reasoning: if I triple-tap the numeric key from the default layer for a modipress on the currency layer, I don't want to revert to the symbolic layer. I intend to return to the default layer.

mcdurdin commented 3 weeks ago

Moving this to 18.0 as it is a lower-priority issue.

keymanapp / keyman

bug(android): modipress is not consistently releasing #10753