rvaiya / keyd

A key remapping daemon for linux.
MIT License
2.86k stars 170 forks source link

Feature request: overload with interrupt on tap #310

Closed herrsimon closed 2 years ago

herrsimon commented 2 years ago

EDIT: clarified the meaning of <timeout>.

As already mentioned in #309, I'm vouching for yet another overload action, referenced as overload_tap here (purely to avoid confusion with the already used overload2, overload3 etc.).

Functionality

The action should be of the form overload_tap(<layer>, <action>, <timeout> and have the following functionality:

The layer <layer> is activated, if the bound key is held for at least <timeout>ms (activation takes place as soon as <timeout>ms have passed) and remains active until the key is released. In addition, if another key is tapped (pressed and released) before the timeout while the bound key is held, the layer is activated as soon as this other key is released and again remains active until the bound key is released. This additional activation distinguishes overload_tap from the planned overload2.

Use case

This action is useful when overloading letter keys with a modifier, say when the a key should also act as shift. Let's assume this is implemented by a = overload2(shift, a, 200), using the proposed overload2 action described in #309. Summarizing what follows, while overload2 works great during input of text (when key rolls are produced), there are issues when entering isolated symbols (done through nested taps).

For a touch typist, two typical cases arise:

1. Using the modifier behaviour while typing words.

Here, key rolls are produced. For example when typing “Level”, the sequence begins with

<a_down> <l_down> <timeout> <a_up> <l_up>,

where, here and in the following, <timeout> denotes the moment in time, where <timeout> ms have passed since the first event (<a_down>).

Given the timeout is fine tuned to the user's typing speed, overload2 works perfectly, with the capital L appearing at the timeout. The delay detween <l_down> and <timeout> is unavoidable in order to distinguish an intended input of “also”, which is produced by

<a_down> <l_down> <a_up> <timeout> <l_up>

In this latter sequence, the output “al” appears at <a_up>, again with an unavoidable delay.

2. Using the modifier behaviour while typing isolated symbols

In contrast to case 1, this is typically done (at least for me) in the form of nested taps.

a) Inner tap is completed before the timeout For example, when trying to type a capital H in isolation (as done in vim to go to the first line of the file), the sequence would be

<a_down> <h_down> <h_up> <timeout> <a_up>

With overload2, the capital H would be emitted at <timeout>, so there is again a delay between <h_down> until <timeout>. But assuming that the user always intends to use the modifier behavior when inputting nested taps (for me this is the case), this delay can be shortened: With overload_tap, H would be emitted at <h_up> already, because here it is clear that a nested tap will be produced. In my case, the time increment between <h_up> and <timeout> is noticeable and the only issue I have when using homerow modifiers in QMK.

b) Outer tap is completed before the timeout There is another case where overload_tap is advantageous to plain overload2: Some users input nested taps very fast, so that the resulting sequence is

<a_down> <h_down> <h_up> <a_up> <timeout>

Here, overload2 would always produce “ah” instead of the intended “H”. This would be addressed by overload_tap as well, which would produce “H” on <h_up>.

References

The proposed functionality is available in ZMK, where it is known as “balanced flavor”. QMK has the PERMISSIVE_HOLD option, which only adresses case 2b but leaves 2a open.

Necessity of a separate overload2

As already mentioned elsewhere, overload_tap would not make overload2 obsolete. The issues described in case 2 above do not arise for all users and crucially depend on individual typing behaviour. While overload_tap will most likely solve all my personal home row modifier worries, it might be another user's obstruction on the way to input nirvana.

nsbgn commented 2 years ago

I like this one!

There's been a lot of messages across the threads and I apologize if I'm missing something that's been laid out already. Then again, my missing something also indicates that documenting all this might become difficult, which is how I'm going to justify my questions... I'm going to focus first on:

Necessity of a separate overload2

It occurs to me that all recent timeout-based suggestions so far could be boiled down to just three general mechanisms, with a nice duality between each. Keep in mind that I'm basing this on use cases in the most abstract sense: I'm aware that fundamentally different mechanisms have been proposed (eg overload2), but I'm quite fond of the config's ability to be understandable at a glance, and I believe that pre-emptive catering to other 'possible' use cases can be a trap --- especially if it results in a proliferation of variants of overloads and timeouts. Specifically, I contest claims like:

case 1 should still be supported as it certainly will be optimal for somebody else. (...) whether the additional logic for nested taps which case 4 would provide is a benefit or not heavily depends on the user and use case in my opinion. (source)

Surely, someone might find a use for it, but I think that would need to be defended specifically. Otherwise, simplicity comes first. As it is, I can definitely see the added benefit of overload_tap over overload2, but not vice versa. ... except if it turns out that accidental a_down b_down b_up a_up actuations often happen when typing ab, which I find ~much~ more unlikely than a_down b_down a_up b_up, but admittedly haven't got the data to back up beyond my own typing habits. Is that possibility what you're basing your advocacy of both on, or am I overlooking something else?

herrsimon commented 2 years ago

I like this one!

I'm really happy to hear that!

There's been a lot of messages across the threads and I apologize if I'm missing something that's been laid out already. Then again, my missing something also indicates that documenting all this might become difficult, which is how I'm going to justify my questions...

In hindsight, I think it's me who has to apologize. As it has happened so often already, I again got overexcited with keyd and my recent posts were way too long for people who are not full-time obsessed with keyboards.

I believe that pre-emptive catering to other 'possible' use cases can be a trap

I agree to this. However, in the present case, I really do think that an additional overload2 (ignoring any key interrupts) should be implemented as well. Let me give two arguments for it:

1. overload2 is more usable for most users Touch typist are the absolute minority among computer users. This can in fact backed up by numerous studies. The ones I have read over the last years attribute their share to anything between 3% and 30%, depending on the sample, and as there is a likely overrepresentation of “power users” in the keyd user base, we're probably facing the upper end of this range. However, the remaining 70-97% type in a “non-standard” way, frequently need visual contact to the keyboard, at least partially apply a “hunt-and-peck” technique and do not type with a consistent speed and rhythm, instead varying from word to word, often recalling the key sequence of frequently typed words from memory and pressing the corresponding keys with the “wrong” fingers. I claim that this overwhelming share of people frequently produces nested taps while typing words. Because of the typically low typing speed, such users would use a rather high timeout, so that basically any nested tap would result in undesired behaviour (I think that this is why QMK only enables PERMISSIVE_HOLD - the rough equivalent to the proposed interrupt on tap - if the timeout is below 500ms). If there was no overload2, these users would essentially be forced to learn how to type properly in order to use home row mods via keyd. Using overload2 however, an “advanced” feature like homerow mods would still be accessible. The pleasant whiff of elitism surrounding keyd (by the fact alone that there is no gui for editing the layout) should not turn into a reek...

2. overload2 provides usable “auto-shift” There is in fact a valid use cases for overload2 that just came to my mind on the way home: You could use it to implement a usable auto-shift feature (holding down a key for a certain time gives its shifted value), where usable here means without speed penalty by having to hold the key to be shifted in isolation. Let me explain: If you opt not to overload your letter keys with modifiers and instead configure every single one of them in the form a = overload2(A, a, 200), where as always the timeout varies from user to user, you could simply hold a letter a bit longer when typing a capitalized word, while already typing the rest of the word (QMK or any of keyd's existing or planned timeouts with interrupts force you to wait while holding the letter to be shifted). Now you might think that the cognitive effort to pull this off with a high typing speed is too much, but this is not true. Just think about a capitalized word as a word whose second letter has been modified by the first and you realize that the technique is exactly the same as the one for regular home row mods. An example should make this clear: If you want to type “Keyd“, imagine that you “mod the e with a k” (think of “k” as being overloaded with some imaginary modifier). Alternatively, imagine in a more concrete way that the "k" is overloaded with shift and then try to type “Eyd”. You hence hold down the k key while pressing the e key, therefore producing <k down> <e down>. At this point, the k key has already been held down long enough to produce K, so you just have to release both keys (no matter in which order) to obtain Ke and then continue typing the rest of the word (yd here). This allows you to obtain shifted letters without pressing any additional shift modifier and - most importantly - without sacrificing typing speed. I just tried this a bit, just imagining that the feature was there and timing the hold times of all keys, and indeed the same hold times I get while using regular home row modifiers are obtained. For this to actually work, overload2 would of course have to accept an arbitrary action as a first argument.

nsbgn commented 2 years ago

Okay, that convinces me. The first point not as much (I think you're underestimating what a niche we occupy, but again, not backed by data). The second, certainly.[^1] An added bonus (for me) is that it's another argument in favour of seeing it in terms of a general timeout rather than a specific overload.

(I also realize I should have made my comment in #309. My bad.)

[^1]: Edited to add: I probably would not do such a thing personally, since it forces max visual latency on every letter key

herrsimon commented 2 years ago

I think you're underestimating what a niche we occupy, but again, not backed by data

Probably you're right, but I still can't imagine that touch typists represent the majority of keyd users. One could probably search the mechanical keyboards reddit for polls, this could give us some idea.

An added bonus (for me) is that it's another argument in favour of seeing it in terms of a general timeout rather than a specific overload.

I also share your point of view (again, already changed my mind several times).

(I also realize I should have made my comment in #309. My bad.)

Throughout the whole issue of coming up with a “complete” timeout/oneshot/overload universe, I find it notoriously hard to focus on one specific thing only, but @rvaiya is right: This is the only way to move forward here. I propose that once we have agreed on features to implement, the naming and possible changes to existing actions (backwards compatible or not) should be discussed in a separate “integration” issue.

So now that the two of us back overload_tap, let's hope that @rvaiya gives us a seal of approval so that this issue can be closed.

rvaiya commented 2 years ago

From what I understand, this is effectively a micro-optimization of overload2 which avoids unnecessary visual latency. It is predicated on the assumption that <a down> <b down> <b up> can unambiguously be resolved as a hold.

I don't know whether or not this assumption holds in practice, but assuming it does, it may make sense to just make it the default behaviour of the newly proposed action (whatever the name may be).

Most of the recent proposals are fundamentally about trying to mitigate the side effects of the same use case, namely the ability to overload a key that is commonly tapped before another.

The existence of different implementations strikes me more as an artifact of QMK/ZMK development history than a thing necessitated by the problem itself.

While I'm not necessarily opposed to leaving granular control up to the user (it doesn't add much additional implementation complexity), I am not yet convinced that the existence of multiple (minutely varied) behaviours adds any value or warrants the additional cognitive overhead.

My instinct is that one of these solutions is objectively superior, and it probably makes sense to just steer the user in that direction.

But assuming that the user always intends to use the modifier behavior when inputting nested taps (for me this is the case), this delay can be shortened:

I suppose this is the key consideration.

As already mentioned elsewhere, overload_tap would not make overload2 obsolete. The issues described in case 2 above do not arise for all users and crucially depend on individual typing behaviour.

Are you aware of any evidence which supports this?


Edit: I just read slakkenhuis' response. His views largely accord with my own.

Surely, someone might find a use for it, but I think that would need to be defended specifically. Otherwise, simplicity comes first.

+1

The ones I have read over the last years attribute their share to anything between 3% and 30%,

Intriguing, I didn't realize touch typing was such a rare talent.

However, the remaining 70-97% type in a “non-standard” way, frequently need visual contact to the keyboard, at least partially apply a “hunt-and-peck” technique

It seems to me that the hunt and peck crowd would be less likely to produce nested taps. I might even go further and posit that most of them can get away with sequence based overloading (overload), but I am open to evidence which suggests otherwise.

depending on the sample, and as there is a likely overrepresentation of “power users” in the keyd user base,

+1

There is in fact a valid use cases for overload2 that just came to my mind on the way home: You could use it to implement a usable auto-shift feature [...]

Putting aside the usability of this (about which you can probably guess my views :P), it's not obvious that this would alter the tap/hold sequences produced by the user. I am inclined to believe those would be invariant.

To help gather data, I've written a small script which detects the presence of nested taps. They were surprisingly sparse in my tests, though there were some notable exceptions. In my case, I found the bigrams na and am (on qwerty) particularly problematic (try typing mammal or stamina 10 times in a row).

You can find the script here. For best results, build keyd from the latest commit (I made some tweaks to monitor).

herrsimon commented 2 years ago

First of all, @rvaiya and @slakkenhuis: Thank you very much for all of your very insightful input. Discussing with you is a real pleasure and I'm also learning a thing or two.

From what I understand, this is effectively a micro-optimization of overload2 which avoids unnecessary visual latency. It is predicated on the assumption that <a down> <b down> <b up> can unambiguously be resolved as a hold.

I don't know whether or not this assumption holds in practice, but assuming it does, it may make sense to just make it the default behaviour of the newly proposed action (whatever the name may be).

Most of the recent proposals are fundamentally about trying to mitigate the side effects of the same use case, namely the ability to overload a key that is commonly tapped before another.

The existence of different implementations strikes me more as an artifact of QMK/ZMK development history than a thing necessitated by the problem itself.

For QMK I was already convinced of this since I began using homerow mods, for ZMK this is very likely true as well. The problem is that there is no systematic data available on what configuration actually works (at least I couldn't find it), and the individual usage reports one finds are varying wildly. The only conclusion one could draw from this is that overloading letter keys is a highly controversial topic and definitely not for everybody.

But assuming that the user always intends to use the modifier behavior when inputting nested taps (for me this is the case), this delay can be shortened:

I suppose this is the key consideration.

I agree to this, but in fact I was wrong when saying that I exclusively input nested taps when I want to apply the modifier. Using the script from your other post made me notice that the script I was using (see below) never detected anything.

As already mentioned elsewhere, overload_tap would not make overload2 obsolete. The issues described in case 2 above do not arise for all users and crucially depend on individual typing behaviour.

Are you aware of any evidence which supports this?

No, I was basically parroting what was written in the QMK/ZMK documentation and several tutorials as it sounded logical, again throwing all good principles over board.

It seems to me that the hunt and peck crowd would be less likely to produce nested taps. I might even go further and posit that most of them can get away with sequence based overloading (overload), but I am open to evidence which suggests otherwise.

Several of my colleagues look like a ten-armed tentacle when typing and their typing speed is characterized by seemingly random slow downs and accelerations. Yet, as they apparently perfected all their weird finger movements during years of regular and intensive typing, I estimate their average typing speed to be somewhere in the range of 70-100 wpm. In any case, using pure overload would definitely slow them down. But then again, this is a very small sample and furthermore none of them belongs to the typical keyd clientele.

There is in fact a valid use cases for overload2 that just came to my mind on the way home: You could use it to implement a usable auto-shift feature [...]

Putting aside the usability of this (about which you can probably guess my views :P), it's not obvious that this would alter the tap/hold sequences produced by the user. I am inclined to believe those would be invariant.

It is in fact not invariant (tested this for some more time after writing the respected post), but after testing I now share your view regarding usability.

To help gather data, I've written a small script which detects the presence of nested taps. They were surprisingly sparse in my tests, though there were some notable exceptions. In my case, I found the bigrams na and am (on qwerty) particularly problematic (try typing mammal or stamina 10 times in a row).

You can find the script here. For best results, build keyd from the latest commit (I made some tweaks to monitor).

Before posting the issue, I first tested by manually inspecting the xev output and then measured my nested taps with a script to which the output of keyd monitor can be piped. It also measures the tapping time, but for this to work, the time needs to be added as a first field on reported key events in monitor.c (I reused your get_time_ms() routine). After your script reported some nested taps, I discovered that mine was not only ugly, but also incorrect, as in not detecting anything. Now it should work. I let it run for the past few hours and it regularly detects nested taps during normal text input, with times below the threshold (so they would result in mistypings). Of course I didn't use the QMK keyboard with my homerow modifier configuration to test this, but the results still provide enough evidence for concluding that the suggested feature will actually not be useful for me.

So coming back to

My instinct is that one of these solutions is objectively superior, and it probably makes sense to just steer the user in that direction.

your instict might be right. It seems as if the simple interrupt-ignoring logic is the direction to steer the user to, at least I can say that it works flawlessly for me since a few weeks as already mentioned (modulo unavoidable visual latency). Boldly infering from just a few hours of my personal typing, the proposed interrupt on tap logic does not seem to be useful. I'm therefore closing this issue.