Closed pyfisch closed 1 year ago
It seems to me that @pyfisch's original proposal suggested defining variants only for non-printable keys. I agree with this and and I don't see why should printable characters have an enum variant/const value.
It's much nicer to match on Key::Unicode("ÿ")
than on Key::YDiaeresis
and it makes application developers' job easier when for example comparing pre-defined characters with the "current" input.
This appears to be @pyfisch's proposal.
The key
field contains "translated" input, so for example if one wishes to match against +
input this is fine; if one wishes to match against shift+=
it is not (though arguably this shouldn't be matched anyway). This field does not distinguish between e.g. left and right Control
keys or main-area numbers and the numpad, but the location
field does (though as with code
below, will extra computation be required to calculate this value?).
There may be another issue with key
: modifiers affect the result (e.g. Shift + c
= C
). So does Ctrl + c
emit c
with the "control" modifier? For whatever reason, on X11, the current ReceivedCharacter(char)
event emits U+03
instead, so making this interface work "as expected" might not be so easy... I don't know.
The code
field is what we referred to recently as location. As was mentioned recently, it may not be desirable to include this field since (1) it will likely include extra computations per key-stroke, even if unneeded, and (2) it may not be available on all platforms (e.g. it makes little sense for virtual/screen keyboards). But if this field is removed, something else is needed (probably scancode
).
Other than that, provided this interface can be implemented for all platforms, I think it is sufficient.
There may be another issue with key: modifiers affect the result (e.g. Shift + c = C). So does Ctrl + c emit c with the "control" modifier? For whatever reason, on X11, the current ReceivedCharacter(char) event emits U+03 instead, so making this interface work "as expected" might not be so easy... I don't know.
X11 keyboard handling sucks. What you want to use on X11 and Wayland is xkbcommon which does not have this and many other quirks. There are various rust bindings for it, but they are somewhat outdated/incomplete but with bindgen you can easily make your own binding.
There may be another issue with key: modifiers affect the result (e.g. Shift + c = C). So does Ctrl + c emit c with the "control" modifier? For whatever reason, on X11, the current ReceivedCharacter(char) event emits U+03 instead, so making this interface work "as expected" might not be so easy... I don't know.
X11 keyboard handling sucks. What you want to use on X11 and Wayland is xkbcommon which does not have this and many other quirks. There are various rust bindings for it, but they are somewhat outdated/incomplete but with bindgen you can easily make your own binding.
You'll get the same U+03 on Wayland with libxkbcommon too, that's what you should get, since you're asking for control chars. I'm not sure if you can tell libxkbcommon to prevent of doing so, but for example alacritty relies on that, and it's also the case for all(at least Windows/macOS/Wayland/X11) platforms to send control chars for control keys. It's not a bug or anything it's just what it is.
@kchibisov There are functions to get the symbol without the control transformation applied.
@kchibisov There are functions to get the symbol without the control transformation applied.
yeah, but you're expecting control chars most of the time, and I'd be surprised to not get them.
Relevant on control chars: Wikipedia, XKB, xkb transformation without control translations
yeah, but you're expecting control chars most of the time, and I'd be surprised to not get them.
Are these standard enough to be reliably useful? Because the alternative is to use the API the way we're trying to design it: match Control
modifier + c
instead of U+03.
I think it should be possible to find out what the pressed key combination was. I'm not saying that it should be what the keyboard input event presents by default but there should at least be a function that allows the user to extract the layout-dependent key that was pressed together with the modifier. If this is too difficult to implement I could accept that this is not something that winit should offer.
Are these standard enough to be reliably useful?
Well, for something like terminal emulator, yeah, since if you suppress those you'll have to bind everything possible to those control chars youself, which is nearly impossible to do. You can match Control
+ c
and suppress chars in your app without issues if you want just ctrl + c as a hotkey.
I think it should be possible to find out what the pressed key combination was. I'm not saying that it should be what the keyboard input event presents by default but there should at least be a function that allows the user to extract the layout-dependent key that was pressed together with the modifier. If this is too difficult to implement I could accept that this is not something that winit should offer.
I'm not sure what you're talking about, it's already possible in winit? Like you always know a modifiers state and a keys you're pressing. And modifiers are a special event. I'd just mention that you can't map scancode -> position, etc, since you can't assume qwerty. So the only thing that you can do to help downstream is to map scancode to Physical(A)
or something like that. And have a virtual keys like we have right now. If you're planning to match by Received chars it could be tricky and you can't really design it hence compose keys, which could send strings of text in one press.
I'm not sure what you're talking about, it's already possible in winit?
See #1700 for a more detailed explanation of what I was referring to when I wrote "extract the layout-dependent key that was pressed together with the modifier".
@ArturKovacs this relates to what we recently discussed: Control + Key gets mapped to some control char code, but it is also possible to get the unmapped version (for Linux platforms at least).
At this point it looks like a minimalist workable API would be to return the scancode (XKB keysym) and expose functions to map to unicode (with and without control char mapping) and to the key location.
At this point it looks like a minimalist workable API would be to return the scancode (XKB keysym) and expose functions to map to unicode (with and without control char mapping) and to the key location.
If we want to have something like that, we shouldn't expect clients to do so(it's way more complex than you may think), what we could do is to allow provide a per window configuration on how modifiers and such input should be handled.
I'm not even saying that you may have a control + certain key rule in xkb that will transform to Super + key on the fly, and you can't just ignore such things, meaning that you should always call to xkb with a proper keymap(which you can't pass to winit users, hence not cross platform), and use that control. As well as someone should handle compose key, right now it's a job of winit, if you expose xkb downstream how should handle composing?
I looked into the documentation of xkbcommon and it doesn't seem that complicated.
Do I misunderstand something or could this really be implemented as shown below?
I looked into the documentation of xkbcommon and it doesn't seem that complicated.
I'm saying that clients shouldn't do libxkbcommon handing, we're already using it on Wayland https://github.com/Smithay/client-toolkit/blob/aac3c503242c8a2a9f37f4a2231e7b540e3a575c/src/seat/keyboard/mod.rs#L422.
I don't understand what you mean by that. Can you explain why code similar to what I wrote shouldn't be in winit?
I don't understand what you mean by that. Can you explain why code similar to what I wrote shouldn't be in winit?
I mean in winit clients, you're free to do that internally in winit if you want to.
I don't mean expose the xkbcommon
functions. I mean make safe wrappers which require only the scancode and the EventLoop
or Window
as context and return a char
. The scancode can be whatever we want to make it and include the keysym, even a layout identifier if necessary.
In an attempt to move this to the implementation phase I tried to gather all suggestions described so far and I compiled an API which seems to represent the most reasonable compromise. Note that each struct may have implementation specific private fields added.
Please let me know if something seems wrong but otherwise I'd like to start implementing this for Windows in a few days.
I'm also willing to make Linux, macOS, and web implementations.
Note that the following code has been updated several times since posting this so some of the following reflections may be obsolete.
This is a very reasonable proposal.
However there is one particular issue I have a strong opinion about. That is the decision to ignore all modifier keys to for the LogicalKey::Unicode value. If the KeyboardEvent and CompositionEvent is correctly implemented on a platform (which is hard) you can reconstruct the input text from these two events (and don't need to rely on ReceivedChars or similar API). This is not possible if modifiers are ignored. An additional concern is that many applications offer single-key-shortcuts such as "[", "]", "?" etc. that require Shift or other modifier keys depending on the keyboard layout.
I think there should be an API to get the key value without modifiers, but it must not be the primary API.
I am happy you want to do some implentations. If you have any questions about Linux or the Web you can ping or email me.
If the KeyboardEvent and CompositionEvent is correctly implemented on a platform (which is hard) you can reconstruct the input text from these two events [...] I think there should be an API to get the key value without modifiers, but it must not be the primary API.
In my proposal the application may use CompositionEvent::Char
for key input with modifiers and use LogicalKey
for key input without modifiers. Furthermore every variant of the CompositionEvent is affected by modifier keys, meaning that composition_input
is likely the only field one needs for text input.
An additional concern is that many applications offer single-key-shortcuts such as "[", "]", "?" etc. that require Shift or other modifier keys depending on the keyboard layout.
I think this depends on how the application wants to handle these cases.
One way to handle this would be to use the PhysicalKey
if position is more important than the label.
Another way is to use the composition input API to look for such input.
if let Some(CompositionEvent::Char(ch)) = input_event.composition_input {
if ch == '?' {
// ? was pressed either directly or through a modified key
}
}
If the KeyboardEvent and CompositionEvent is correctly implemented on a platform (which is hard) you can reconstruct the input text from these two events (and don't need to rely on ReceivedChars or similar API).
Could you elaborate on why "reconstructing the input text" instead of relying on platform APIs is desirable?
In my proposal the application may use CompositionEvent::Char for key input with modifiers and use LogicalKey for key input without modifiers. Furthermore every variant of the CompositionEvent is affected by modifier keys, meaning that composition_input is likely the only field one needs for text input.
This seems useful. I did miss CompositionEvent::Char
. Although I still can't (easily) find out when a given symbol for which modifiers are needed is released.
An additional concern is that many applications offer single-key-shortcuts such as "[", "]", "?" etc. that require Shift or other modifier keys depending on the keyboard layout.
I think this depends on how the application wants to handle these cases.
One way to handle this would be to use the
PhysicalKey
if position is more important than the label.
Many applications have mnemonics, so they are about symbol, not location.
If the KeyboardEvent and CompositionEvent is correctly implemented on a platform (which is hard) you can reconstruct the input text from these two events (and don't need to rely on ReceivedChars or similar API).
Could you elaborate on why "reconstructing the input text" instead of relying on platform APIs is desirable?
If you are building a web browser. :wink: I proposed this interface originally for use in the servo browser, but since Mozilla discontinued this project it is not as important now. Besides browsers you will need it for high-quality GUI applications especially in the context of IME. How I understand the CompositionEvent
API, its primary purpose is to be able to accurately construct the input text in a text field/word processor. If you decide you don't need this an API with just characters, key-down, key-up is likely sufficient for most games and simple applications.
This seems useful. I did miss CompositionEvent::Char. Although I still can't (easily) find out when a given symbol for which modifiers are needed is released.
It took me a moment to realize what you meant by this, but I think you mean that there's no easy way to tell that i.e. ! was released when Shift+1 is required to produce !.
Many applications have mnemonics, so they are about symbol, not location.
I'm beginning to think that a proper keyboard layout query API is required to support the greatest number of use-cases.
pub struct KeyboardLayout {
layout: platform_impl::KeyboardLayout,
}
impl KeyboardLayout {
pub fn logical_key(&self, physical_key: PhysicalKey, modifiers: ModifiersState) -> LogicalKey {
self.layout.logical_key(physical_key, modifiers)
}
}
impl PhysicalKey {
pub fn to_logical(self, layout: &KeyboardLayout, modifiers: ModifiersState) -> LogicalKey {
layout.logical_key(self, modifiers)
}
}
I think Android, Linux (libxkbcommon) and Windows have good enough APIs to implement the minimal example above, and I think there's an experimental API for this in browser-land but I don't know if it's usable for the above example. I have absolutely no idea what the situation is on either macOS or iOS.
There would probably have to be a KeyboardLayoutChanged(KeyboardLayout)
event somewhere, but I'm not sure where to stick it.
/// This value ignores all modifiers like shift and ctrl, and /// it is always uppercase. Unicode(&'static str),
I'm not sure if converting to uppercase is the right thing to do here. Some characters don't round-trip losslessly through to_uppercase
and to_lowercase
.
This might be doable if you use what the key would give you if Caps Lock is the only active modifier. Keys affected by Caps Lock have unambiguous mappings between the character emitted without Caps Lock and one emitted with Caps Lock on the layouts I regularly use, but there are a lot of different layouts out there, and one of them might not have this property.
This seems useful. I did miss CompositionEvent::Char. Although I still can't (easily) find out when a given symbol for which modifiers are needed is released.
Excuse me @pyfisch I completely missed this. Yeah, I think we can give a written guarantee in documentation that every CompositionEvent::Char
will come with a Some(KeyboardEvent)
from which you can tell wheter it's a press or release. This could maybe be expressed by the types themselves but I think any such design will result in pattern matching hell in the application.
/// This value ignores all modifiers like shift and ctrl, and /// it is always uppercase. Unicode(&'static str),
I'm not sure if converting to uppercase is the right thing to do here. Some characters don't round-trip losslessly through
to_uppercase
andto_lowercase
.
Yeah that uppercase thing I wasn't entirely sure about and I absolutely didn't know that the conversion is not lossless. Thanks for pointing that out! I'll remove the uppercase guarantee and update this to "This value ignores all modifiers including but not limited to shift, caps lock, and ctrl". Does that sound good?
Yeah that uppercase thing I wasn't entirely sure about and I absolutely didn't know that the conversion is not lossless. Thanks for pointing that out! I'll remove the uppercase guarantee and update this to "This value ignores all modifiers including but not limited to shift, caps lock, and ctrl". Does that sound good?
I think you need a couple more commas (and maybe the <kbd>
tag), but yes.
"This value ignores all modifiers including, but not limited to, Shift, Caps Lock, and Ctrl"
Thank you, I updated the comment adding the kbd
tags as you suggested. I did not include the additional commas however because I'm confident that the current form is grammatically correct, and I think we should not get into a discussion about English grammar or writing style in this thread.
I was thinking more like this. (I'm not certain if we need the layout identifier. I left out composition events which could be included as above.) Both KeySym
and KeyboardLayout
are probably just a u32
internally or even smaller.
/// Platform specific value identifying a key on a keyboard
#[derive(Clone, Debug, PartialEq, PartialOrd, Hash)]
struct KeySym { .. }
/// Platform specific value identifying the keyboard layout (from a list of available ones)
#[derive(Clone, Debug, PartialEq, PartialOrd, Hash)]
struct KeyboardLayout { .. }
struct KeyboardEvent {
keysym: KeySym,
layout: KeyboardLayout,
key_state: ElementState,
repeat: bool,
}
impl Window {
fn get_layout_name(&self, KeyboardLayout) -> String;
fn get_active_layout(&self) -> KeyboardLayout;
/// Translate the key according to the layout, but disregarding CapsLock and modifier keys
///
/// If a unicode value is produced, it is usually lower-case.
fn get_key_label(&self, keysym: KeySym, layout: KeyboardLayout) -> KeyLabel;
/// Translate the key according to the layout and modifiers
fn get_transformed_key(&self, keysym: KeySym, layout: KeyboardLayout) -> KeyLabel;
/// Attempt to find the [`KeySym`] producing this label
///
/// This is not guaranteed to return a result. On some platforms it may
/// never return a result. In some cases it may arbitrarily choose one of
/// multiple [`KeySym`]s producing this label.
fn find_keysym(&self, label: &KeyLabel, layout: KeyboardLayout) -> Option<KeySym>;
/// Attempt to find the [`KeyLocation`] corresponding to a [`KeySym`]
fn get_key_location(&self, keysym: KeySym) -> Option<KeyLocation>;
/// Attempt to find the [`KeySym`] corresponding to a [`KeyLocation`]
fn find_keysym_by_location(&self, location: KeyLocation) -> Option<KeySym>;
}
#[non_exhaustive]
enum KeyLabel {
// TODO: maybe this can be backed by [u8; LEN]
Unicode(&'static str),
Ctrl(Location),
Alt(Location),
...
LeftArrow,
RightArrow,
...
F1,
F2,
...
}
enum Location {
Standard,
Left,
Right,
Numpad,
}
/// Identifies locations on a keyboard (relative to US Qwerty?)
#[non_exhaustive]
enum KeyLocation {
...
}
Edit: added KeyLocation
I don't see the reason for introducing the KeyboardLayout
. Please describe the use-case for it in simple terms so I can understand it 😄
What is the difference between KeyLocation
and KeySym
? Why isn't KeyLocation
enough?
I don't see the reason for introducing the KeyboardLayout
In general, translation is dependent on the layout, and I think all major OSs allow convenient layout switching, so there's no guarantee that the active layout is the same one as when the app launched. This means that if we give the app an API for translating KeySym
→ KeyLabel
that can be called later, the value may be wrong if we don't account for this. (As an alternative we could embed the layout within KeySym
, though if we ever let apps actively switch layouts this will probably come back to haunt us.)
What is the difference between KeyLocation and KeySym? Why isn't KeyLocation enough?
Two things I guess. One is that KeyLocation
is whatever enum we define; there's no guarantee that it will contain a unique value for every key on every keyboard so if we map the OS's identifier to this, then back again for translation to KeyLabel
, the result may be lossy. Secondly, that's an extra translation step.
Note: KeySym
is what I previously called scancode
. This is vaguely modelled on the XKB API.
In general, translation is dependent on the layout [...]
Alright that's fair, but if I'm not mistaken the KeyboardLayout
is not needed if we stick with the other proposal.
One is that KeyLocation is whatever enum we define [...] Secondly, that's an extra translation step.
I see.
These points are all compltely valid but only when using the architecture you proposed. However I'm not seening why your proposal is objectively better than the other one. I see two aspects where it's an improvement over the other one:
KeyLabel
is only done when the application truly needs it, so one can save a bit of processing time. Although I don't think this alone is good argument because the performance impact is negligible for something that happens as rarely as a keypress.Char
variant of the CompositionEvent
and adding a character field to the KeyboardEvent
.At the same time it adds the burden of having to keep track of the KeyboardLayout
which of course is not a burden if it has a use that I'm not seeing right now.
Well, this API is certainly less simple to use, but its advantages are a little more extensive:
We have two desirable translations from native keysyms to labels/values: including modifiers and excluding them. As @kchibisov said above, you ideally want both (so your API should have two versions of logical_key
):
Well, for something like terminal emulator, yeah, since if you suppress those you'll have to bind everything possible to those control chars youself, which is nearly impossible to do. You can match Control + c and suppress chars in your app without issues if you want just ctrl + c as a hotkey.
Translations to/from various formats may not be available on all platforms, or may not be easy to write immediately. My API allows partial compliance (by returning None
in various functions). Of course this may be a pain to deal with at the application level, but it avoids having to fudge too much (if e.g. your on-screen keyboard won't give you a physical_key
).
We have two desirable translations from native keysyms to labels/values: including modifiers and excluding them.
I completely agree and both proposals solve this problem. Yours have 'label' and 'transformed_key', and the other has 'logical_key' and 'Char'. Yours may be a bit cleaner by using the same type for 'label' and 'transformed_key' but again this is something that can be adopted by the other API.
Translations to/from various formats may not be available on all platforms, or may not be easy to write immediately. My API allows partial compliance [...]
This is the strongest argument in my opinion. In fact this convinced me that your proposal is objectively better. The only thing I'm still a bit worried about is whether an implementation exists for this API for the minimal required featureset on all platforms, but I guess the only way to find out is to try implementing it.
Hold up a minute. After giving this more thought I'm back on the fence. The only functions you proposed returning Option
s are
fn find_keysym(&self, label: &KeyLabel, layout: KeyboardLayout) -> Option<KeySym>;
fn get_key_location(&self, keysym: KeySym) -> Option<KeyLocation>;
fn find_keysym_by_location(&self, location: KeyLocation) -> Option<KeySym>;
But again, it seems to me that these are not required if we choose the other API. Is there a use-case not covered by the other API where these are needed?
Yet again I updated my proposal to contain more documentation and also changed how text input is reported inspired by @dhardy's latest proposal.
Do we need the transformed (character
) output in addition to CompositionEvent
? I'm still not really clear on how that API works. Does it return CompositionEnd(text)
for all text input?
Your updated proposal looks adequate, I guess. I'm not sure whether KeyLabel
and KeyPosition
will need None
/Unknown
variants to handle discrepancies between platforms and input devices.
e.g. your on-screen keyboard won't give you a
physical_key
Do on-screen keyboards ever not imitate real keyboards? Windows' built-in on-screen keyboard looks like it's indistinguishable from a regular keyboard (if you ignore the window focus stuff).
Additionally, some keys like my keyboard's dedicated ⏹, ⏮️, ⏯️, and ⏭️ keys don't emit a unique scancode
and instead give you 0
on Windows.
I'm not sure if you could "fake" a KeyPosition
for these keys since I don't know exactly how special their behaviour is. uievents-code
would have you believe that this is reasonable to do, so there might be a way to get this to work on all platforms.
Instead of a transformed Option<&'static str>
, there should just be a transformed KeyLabel
. This way, you can get at the second layer of the numpad, which would otherwise be inaccessible with the current API.
The modifier-independent KeyLabel
should probably be a Option<KeyLabel>
for now since the web API this would depend on seems to only be an early draft and is implemented only in Chrome and Chromium-derivatives (and partially at that).
I guess. I'm not sure whether
KeyLabel
andKeyPosition
will needNone
/Unknown
variants to handle discrepancies between platforms and input devices.
I think it's a good idea to do this. Such variants should also contain platform-specific values which allow you to somewhat uniquely identify the keypress, at least for KeyPosition
. This would be somewhat in line with some games I've played (can't remeber which ones) and Discord (Discord lets me use F20-24 as keybinds, but displays it as "UNK131-135").
Bikeshed: I really prefer PhysicalKey
/LogicalKey
over KeyPosition
/KeyLabel
. It's not that important, but I think those names are more in line with the Physical
/Logical
split in the dpi
module. Keyboard input is different from dpi, but I feel like it's similar enough for the analogy to make some sense.
Do we need the transformed (character) output in addition to CompositionEvent?
Unfortunately we do. I didn't realize this earlier myself but as @maroider pointed out, the transformed input has to be able to represent non-printable keys like Insert and Delete due to NumLock shenanigans.
Furthermore I updated the code so that a documentation comment hopefully clears up when a CompositionEvent
is triggered.
I'm not sure whether KeyLabel and KeyPosition will need None/Unknown variants to handle discrepancies between platforms and input devices.
You are right, definitely. I added those too.
Do on-screen keyboards ever not imitate real keyboards?
There is no guarantee they do imitate real keyboards. Even if they do, they might contain keys that cannot be sensibly mapped to real keyboard positions which should report Unknown
positions in my opinion.
Additionally, some keys like my keyboard's dedicated ⏹, ⏮️, ⏯️, and ⏭️ keys don't emit a unique scancode and instead give you 0 on Windows.
Exactly. I don't think that winit should somehow try to come up with a position for those keys. It should just be Unknown
.
Instead of a transformed Option<&'static str>, there should just be a transformed KeyLabel. This way, you can get at the second layer of the numpad, which would otherwise be inaccessible with the current API.
Thanks again for pointing that out. I updated the API according to this.
The modifier-independent KeyLabel should probably be a Option
for now since the web API this would depend on seems to only be an early draft and is implemented only in Chrome and Chromium-derivatives (and partially at that).
Hmm that is unfortunate indeed. Although I think this can be handled relatively gracefuly until that API gains a more widespread support. Instead of making the logical_key
an Option
, we could check if the key
in the keydown
event is lowercase and if it is, use that. Otherwise check if it's uppercase if it is, call to_lowercase
on it, and use that. If it's neither report Unknown
. This would at least allow implementing the most common shortcuts which in my view is the primary reason we have the logical_key
field.
Bikeshed: I really prefer PhysicalKey/LogicalKey over KeyPosition/KeyLabel
Not a problem for me. In the updated version I renamed them like this.
Instead of making the
logical_key
anOption
, we could check if thekey
in thekeydown
event is lowercase and if it is, use that. Otherwise check if it's uppercase if it is, callto_lowercase
on it, and use that. If it's neither reportUnknown
.
If it's implemented this way, then it should be documented very clearly.
Instead of a transformed Option<&'static str>, there should just be a transformed KeyLabel. This way, you can get at the second layer of the numpad, which would otherwise be inaccessible with the current API.
Thanks again for pointing that out. I updated the API according to this.
Alas, I've led you slightly astray on this one. The layer I was worried about is the base layer, which would still be accessible as logical_key
. The second layer (accessible with Num Lock on) is the one with numeric inputs. I usually have Num Lock on, so that's why I got myself confused. I think LogicalKey
is more semantically correct (and can handle more peculiar layouts), but Option<&'static str>
would have worked with most layouts.
/// Note that the `Unicode` variant may contain multiple characters. /// For example when pressing <kbd>^</kbd> using a US-International /// layout, this will be `Dead` for the first keypress and will be /// `Unicode("^^")` for the second keypress.
I'm not sure if this is how dead keys behave on every platform. It's been my experience that pressing ^ twice on Linux will give me a single "^". My layout isn't "US-International", but I doubt that's the issue. In either case, since this is a "dead key thing", it should be handled in CompositionEvent
.
You could potentially get away with LogicalKey::Unicode("^")
(on the first keypress) here by cheating a little on the Web backend and wait for the first compositionupdate
which will (hopefully) reveal which dead key was pressed, since each dead key ought to produce a unique combining character. The modifier-independent value will still have to make do with LogicalKey::Dead
though.
The way a text editor would have to handle this would be to ignore transformed_key
when there's a CompositionEvent
, since dead keys ought to fire CompositionEvent
s.
Other than that, the only thing I have issues with is the shape of the text input part of this API. It feels subtly wrong, but I can't seem to figure out a better way to do it.
It does seem clunky. I'd still like to see a tabulation of what data is available. Something like:
Meanwhile we could categorise input as:
Optimisations and redundancies:
scancode
WindowEvent
and use &'a str
instead of String
KeyRelease
a separate event type and only include the physical location? No, since this may be Unknown
.I still think there's reason to consider my scancode
/keysym
API instead of @ArturKovacs's; it has a smaller message size and avoids having to translate to all types of input for both press and release.
Also, my experience with KAS is that one might have complex rules to determine how to handle a press, but handling release is normally a simple dictionary-removal using the scancode
as a unique key. Without the scancode
there is no unique key. Alternatively we could go with @ArturKovacs's API but add scancode
, then (maybe) use a different event type with only the scancode
for KeyRelease
.
It appears that I have been terribly naive in assuming we can simply use a scancode and translate to a representation of our choice. Win32 uses a "virtual-key code" and a "scan code" and sometimes requires both plus further state for translations. Keyboard input structs have been extended for non-keyboard input and so may lack a scancode, however in this case we might choose to deliver only text input and not key input.
Also, Win32 doesn't appear to have any way of differentiating "physical location" and "key labels according to the current layout"; it simply has a Virtual-Key Code (whose value presumably depends on the current layout). We might be able to get around this by loading a specific layout such as standard US English and using this for translation in addition to the active layout? Or we could attempt translation from scan-codes, though I believe those are device dependent so probably not viable.
This makes it difficult to do better than the current API.
Are we even able to properly associate key, character and IME input? The composition-event
branch lists all three as separate events.
For convenience here's one more link to my updated proposal.
I'm not sure if this is how dead keys behave on every platform. It's been my experience that pressing ^ twice on Linux will give me a single "^"
I didn't know this. Alright... I think this should match the platform specific behaviour then as I would certainly expect all applications to behave similarly to eachother on one platform.
In either case, since this is a "dead key thing", it should be handled in CompositionEvent.
That was my thought as well but when I tested it with Firefox on Windows the Javascript API's key
field only contained Dead
on the first keypress and the isComposing
field was set to false.
You could potentially get away with LogicalKey::Unicode("^") (on the first keypress) here by cheating a little on the Web backend and wait for the first compositionupdate
Again this is not treated as a composition event at least on the Web but even then I don't think this is a good idea because we want to let the applications know about at least the physical aspects of keypress events as soon they happen. But then it becomes difficlut to conviniently communicate the relationship between the physical keypress and the composition event. Although admitedly that aspect is alreaady not perfect in my current proposal because the CompositionEnd
may be detached from the physical keypress.
I'd still like to see a tabulation of what data is available.
The tabulation you just made there seems accurate to me. To answer a few questions there: I'm definitely not against exposing the platform specific scancode through, say an Ext
trait. Otherwise the physical key is the platform independent representation of the scancode which I think you know I didn't want to second guess.
I don't have a strong opinion about whether the Unicode variant should be allowed to have a 0 length. If you think it's benefitial to provide a guarantee regarding whether it can be empty, I would say that there is no problem with that. We can just convert empty strings coming form the OS to unknow at the implementation side given that it's not a dead key input.
The max length of the unicode variant of the transformed in input is unknown as far as I can tell. It's up to the platform's implementation how they want to present that aspect of text input to the applications so it can be any positive length yeah.
"'Compose start" is always a press event?
Yes (unless we find out during implementation that this cannot be guaranteed).
Release events don't need to include most of the above data so long as they can be matched against press events; currently this is done via scancode
I think there must be a balance between avoiding redundancy and ease of use. In my opinion sending all the information once more together with the key release does not tip this balance. Could you show a specific example or otherwise explain where this redundancy is undesirable?
Maybe we can attach a lifetime to WindowEvent and use &'a str instead of String
I don't see what would be the argument for using reference here. Even if you have to make an allocation when taking the string from the OS, the performance impact of such allocations are fully negligable. Unless this is proven otherwise I don't think we should consider introducing non-static lifetimes into this interface.
I don't see any, despite four simultaneous ways of representing input above
I'm not sure what you are referring to here. Is it the press, repeat, release, and composition?
I still think there's reason to consider my scancode/keysym API instead of @ArturKovacs's; it has a smaller message size and avoids having to translate to all types of input for both press and release.
With an estimate favoring your argument, it saves let's say 30 bytes of memory when a smartwatch has at least a million times that much memory. It also saves maybe a few microseconds from a function that get's called once every 50.000 microseconds if the user is typing at 20 keystrokes a second which is faster than the fastest recorded typing speed according to my calculations.
So with differences this small I think that lower memory and better performance are not valid arguments for picking a particlar API.
Without the scancode there is no unique key.
I see. I didn't know about this use case before so I just added the scancode to my proposal.
I'm not sure what you are referring to here. Is it the press, repeat, release, and composition?
Physical location, label (unshifted translation by current layout), translated (unicode + control chars), IME.
Of these, physical location and label may be the same thing (some type of VirtualKeyCode
), but with the first using a fixed layout (US) and the latter using the active layout.
Translated input and IME input are roughly the same except that the former may include control chars and the latter may be delayed (and may be passed during multiple edit states).
It appears that I have been terribly naive in assuming we can simply use a scancode and translate to a representation of our choice. Win32 uses a "virtual-key code" and a "scan code" and sometimes requires both plus further state for translations.
Eh, you might be able to get away with using MapVirtualKeyW
or MapVirtualKeyExW
with MAPVK_VSC_TO_VK_EX
and the scancode to get the corresponding vkey. Not sure if this would match some of the "interesting" quirks with the scancode+vkey combinations you get directly.
Also, Win32 doesn't appear to have any way of differentiating "physical location" and "key labels according to the current layout"; it simply has a Virtual-Key Code (whose value presumably depends on the current layout).
You have to go out of your way to get this information. My understanding is that the non-alphanumeric keys can't be changed much (if at all) from one layout to another.
Keyboard input structs have been extended for non-keyboard input and so may lack a scancode, however in this case we might choose to deliver only text input and not key input.
Good catch. PhysicalKey::Unknown
could also work here.
Also, Win32 doesn't appear to have any way of differentiating "physical location" and "key labels according to the current layout"; it simply has a Virtual-Key Code (whose value presumably depends on the current layout). We might be able to get around this by loading a specific layout such as standard US English and using this for translation in addition to the active layout?
Yeah, there's no native solution for this. You'd have to load the current keyboard layout, say before every Event::NewEvents
, since I don't think Windows notifies you that the layout has changed. There's probably also a case to be made for loading the keyboard layout on every keyboard event, since you can change the layout with a keyboard shortcut (Win+Space bar).
You'd then have to check the vkey to see if its a functional key, control pad key, arrow key, numpad key, function key, media key or backspace. If it's one of those, then I think you don't have to look further. For the other keys, you might have to use ToAsciiEx
or ToUnicodeEx
to get the value that's mean to be produced by a keypress + some set of modifiers.
Or we could attempt translation from scan-codes, though I believe those are device dependent so probably not viable.
From "Keyboard Scan Code Specification":
Under all Microsoft operating systems, all keyboards actually transmit Scan Code Set 2 values down the wire from the keyboard to the keyboard port. These values are translated to Scan Code Set 1 by the i8042 port chip.1 The rest of the operating system, and all applications that handle scan codes expect the values to be from Scan Code Set 1. Scan Code Set 3 is not used or required for operation of Microsoft operating systems.
While that document is from the year 2000, it still seems to be the case today that the scancodes you get from Windows are (mostly) from "PS/2 Scan Code Set 1". They are also stable enough that several notable games use them for keybinds. Unfortunately, Windows doesn't emit non-zero scancodes for certain keys, so you can't have every physical key be represented by a scancode. Some of these keys shouldn't be able to be re-mapped in any way, though (outside of gaming keyboard shenanigans), so you might just get away with using the vkey to retrieve physical location for some those keys.
Are we even able to properly associate key, character and IME input? The
composition-event
branch lists all three as separate events.
Now that's something I truly don't know.
In either case, since this is a "dead key thing", it should be handled in CompositionEvent.
That was my thought as well but when I tested it with Firefox on Windows the Javascript API's
key
field only containedDead
on the first keypress and theisComposing
field was set to false.You could potentially get away with LogicalKey::Unicode("^") (on the first keypress) here by cheating a little on the Web backend and wait for the first compositionupdate
Again this is not treated as a composition event at least on the Web but even then I don't think this is a good idea because we want to let the applications know about at least the physical aspects of keypress events as soon they happen. But then it becomes difficlut to conviniently communicate the relationship between the physical keypress and the composition event. Although admitedly that aspect is alreaady not perfect in my current proposal because the
CompositionEnd
may be detached from the physical keypress.
Pressing ^ should fire composition events immediately after the keydown
event, unless I've misunderstood Example 26 in the uievents
specification. It may, however, be challenging to associate the compositionupdate
event with the keydown
event.
Thanks for the response @maroider. This seems to indicate that we could omit physical_location
from the results and use a function to try mapping (scancode, vkey)
to physical_location
as well as physical_location → scancode
(with both functions returning an Option
).
Although that's only viable if all significant platforms function roughly this way.
Eh, you might be able to get away with using MapVirtualKeyW or MapVirtualKeyExW with MAPVK_VSC_TO_VK_EX
Not a good idea since scancode
may be 0.
This seems to indicate that we could omit
physical_location
from the results and use a function to try mapping(scancode, vkey)
tophysical_location
as well asphysical_location → scancode
(with both functions returning anOption
).Although that's only viable if all significant platforms function roughly this way.
It will likely work on Linux and Windows. The web backend will likely be tricky, challenging or impossible to implement properly. What I've been able to gather from the macOS documentation suggests that it might be possible to implement this. iOS seems to have a clear separation between on-screen and physical keyboards, and what you've just described might be possible to implement. Android also has a clear separation between physical and on-screen keyboards, but Android's documentation also explicitly unifies on-screen keyboards and IMEs. What you've described should also be possible to implement here.
With the above in mind, should mobile on-screen keyboard input be treated as IME input? Fully adapting the mobile APIs for text input will likely require some additions to the IME API later down the road, but implementing whatever is decided upon here would be a huge improvement over the current state (which is essentially unimplemented).
Not a good idea since
scancode
may be 0.
I can't believe I forgot that.
EDIT: After rereading this comment, I feel like we need a more complete overview of what's available in what form on each platform. I've got a very incomplete document that's kind-of-sort-of that, but it needs more work.
Winit is used for many applications that need to handle different kinds of keyboard input.
KeyboardEvent
well.Currently there are two events for text input in Winit:
KeyboardInput
andReceivedCharacter
.The
KeyboardInput
event carries information about keys pressed and released.scancode
is a platform-dependent code identifying the physical key.virtual_keycode
optionally describes the meaning of the key. It indicates ASCII letters, some punctuation and some function keys.modifiers
tells if the Shift, Control, Alt and Logo keys are currently pressed.The
ReceivedCharacter
event sends a single Unicode codepoint. The character can be pushed to the end of a string and if this is done for all events the user will see the text they intended to enter.Shortcomings
This is my personal list in no particular order.
VirtualKeyCode
is seen as incomplete (#71, #59). Without a given list it is hard to decide which keys to include and when the list is complete. Also it is necessary to define each virtual key code so multiple platforms will map keys to the same virtual key codes. While it probably uncontroversial that ASCII keys should be included for non-ASCII single keys found on many keyboards like é, µ, or ü it is more difficult to decide and to create an exhaustive list.VirtualKeyCode
should capture the meaning of the key there are different codes for e.g. "0":Key0
andNumpad0
orLControl
andRControl
.ScanCode
is platform dependent. Therefore apps wanting to use keys like WASD for navigation will assume an QWERTY layout instead of using the key locations.ReceivedCharacter
andKeyboardInput
events. While this is not necessary for every application some (like browsers) need it and have to use ugly (and incorrect) work-arounds. (#34)In general there are many issues that are platform-dependant and where it is unclear what the correct behavior is or it is not documented. Both alacritty and Servo just to name two applications have multiple issues where people mention that keyboard input does not work as expeced.
Proposed Solution
Winit is not the first software that needs to deal with keyboard input on a variety of platforms. In particular the web platform has a complete specification how keyboard events should behave which is implemented on all platforms that Winit aims to support.
While the specification talks about JS objects it can be easily ported to Rust. Some information is duplicated in
KeyboardEvent
for backwards compatibility but this can be omitted in Rust so Winit stays simpler.See the keyboard-types for how keyboard events can look like in Rust.
VirtualKeyCode
is replaced with aKey
. This is an enum with all the values for functional keys and a variant for Unicode values that stores printable characters both from the whole Unicode range. SpecificationScanCode
is complemented byCode
. Codes describe physical key locations in a cross-platform way. Specificationrepeat
attribute is added.Fn
,FnLock
)ReceivedCharacter
. EitherReceivedCharacter
is kept around for easier use or a utility function is provided that takes keyboard and composition events and emits the printable text.Implementation
This is obviously a breaking change so there needs to be a new release of winit and release notes. While the proposed events are very expressive it is possible to convert Winit to the new events first and then improve each backend to emit the additional information about key-codes, locations, repeating keys etc.
Thank you for writing and maintaining Winit! I hope this helps to get a discussion about keyboard input handling started and maybe some ideas or even the whole proposal is implemented in Winit.