A keyboard input model - Githubissues

pyfisch commented 5 years ago

TLDR: I think that Winit needs more expressive keyboards events and to follow a written specification to keep platform inconsistencies to a minimum. I propose to adapt the JS KeyboardEvent for winit and to follow the UI Events specification for keyboard input.

Winit is used for many applications that need to handle different kinds of keyboard input.

Games: Physical location of keys like WASD for movement and actions. Text inpput for names and chat.
GUI applications: Text input and keyboard shortcuts.
the Servo Browser: Wants to support JS KeyboardEvent well.

Currently there are two events for text input in Winit: KeyboardInput and ReceivedCharacter.

pub struct KeyboardInput {
    pub scancode: ScanCode,
    pub state: ElementState,
    pub virtual_keycode: Option<VirtualKeyCode>,
    pub modifiers: ModifiersState,
}

The KeyboardInput event carries information about keys pressed and released. scancode is a platform-dependent code identifying the physical key. virtual_keycode optionally describes the meaning of the key. It indicates ASCII letters, some punctuation and some function keys. modifiers tells if the Shift, Control, Alt and Logo keys are currently pressed.

The ReceivedCharacter event sends a single Unicode codepoint. The character can be pushed to the end of a string and if this is done for all events the user will see the text they intended to enter.

Shortcomings

This is my personal list in no particular order.

List of VirtualKeyCode is seen as incomplete (#71, #59). Without a given list it is hard to decide which keys to include and when the list is complete. Also it is necessary to define each virtual key code so multiple platforms will map keys to the same virtual key codes. While it probably uncontroversial that ASCII keys should be included for non-ASCII single keys found on many keyboards like é, µ, or ü it is more difficult to decide and to create an exhaustive list.
While VirtualKeyCode should capture the meaning of the key there are different codes for e.g. "0": Key0 and Numpad0 or LControl and RControl.
The ScanCode is platform dependent. Therefore apps wanting to use keys like WASD for navigation will assume an QWERTY layout instead of using the key locations.
It is unclear if a key is repeated or not. Some applications only want to act on the first keypress and ignore all following repeated keys. Right now these applications need to do extra tracking and are probably not correct if the keyboard focus changes while a key is held down. (#310)
A few useful modfiers like AltGraph and NumLock are missing.
There is no relation between ReceivedCharacter and KeyboardInput events. While this is not necessary for every application some (like browsers) need it and have to use ugly (and incorrect) work-arounds. (#34)
Dead-key handling is unspecified and IMEs (Input Method Editors) are not supported.

In general there are many issues that are platform-dependant and where it is unclear what the correct behavior is or it is not documented. Both alacritty and Servo just to name two applications have multiple issues where people mention that keyboard input does not work as expeced.

Proposed Solution

Winit is not the first software that needs to deal with keyboard input on a variety of platforms. In particular the web platform has a complete specification how keyboard events should behave which is implemented on all platforms that Winit aims to support.

While the specification talks about JS objects it can be easily ported to Rust. Some information is duplicated in KeyboardEvent for backwards compatibility but this can be omitted in Rust so Winit stays simpler.

See the keyboard-types for how keyboard events can look like in Rust.

(shortcoming 1) VirtualKeyCode is replaced with a Key. This is an enum with all the values for functional keys and a variant for Unicode values that stores printable characters both from the whole Unicode range. Specification
(shortcoming 2) is also adressed by this. There is just one value for keys like "Control" but if necessary one can distinguish left/right or keyboard/numpad keys by their location attribute.
(shortcoming 3) ScanCode is complemented by Code. Codes describe physical key locations in a cross-platform way. Specification
(shortcoming 4) a repeat attribute is added.
(shortcoming 5) All known modifier keys are supported. **Specification Note: W3C decided to include some keys that are usually handled in hardware and don't emit keyboard events (like Fn, FnLock)
(shortcoming 6) received characters and keyboard events are now one (exceptions see below)
(shortcoming 7) to handle dead keys and IMEs a composition event is introduced. It describes the text that should be added at the current cursor position. Specification Note: The introduction composition events makes it a bit harder to get "just the text" which is currently emitted by ReceivedCharacter. Either ReceivedCharacter is kept around for easier use or a utility function is provided that takes keyboard and composition events and emits the printable text.

Implementation

This is obviously a breaking change so there needs to be a new release of winit and release notes. While the proposed events are very expressive it is possible to convert Winit to the new events first and then improve each backend to emit the additional information about key-codes, locations, repeating keys etc.

Thank you for writing and maintaining Winit! I hope this helps to get a discussion about keyboard input handling started and maybe some ideas or even the whole proposal is implemented in Winit.

ArturKovacs commented 4 years ago

It seems to me that @pyfisch's original proposal suggested defining variants only for non-printable keys. I agree with this and and I don't see why should printable characters have an enum variant/const value.

It's much nicer to match on Key::Unicode("ÿ") than on Key::YDiaeresis and it makes application developers' job easier when for example comparing pre-defined characters with the "current" input.

dhardy commented 4 years ago

This appears to be @pyfisch's proposal.

The key field contains "translated" input, so for example if one wishes to match against + input this is fine; if one wishes to match against shift+= it is not (though arguably this shouldn't be matched anyway). This field does not distinguish between e.g. left and right Control keys or main-area numbers and the numpad, but the location field does (though as with code below, will extra computation be required to calculate this value?).

There may be another issue with key: modifiers affect the result (e.g. Shift + c = C). So does Ctrl + c emit c with the "control" modifier? For whatever reason, on X11, the current ReceivedCharacter(char) event emits U+03 instead, so making this interface work "as expected" might not be so easy... I don't know.

The code field is what we referred to recently as location. As was mentioned recently, it may not be desirable to include this field since (1) it will likely include extra computations per key-stroke, even if unneeded, and (2) it may not be available on all platforms (e.g. it makes little sense for virtual/screen keyboards). But if this field is removed, something else is needed (probably scancode).

Other than that, provided this interface can be implemented for all platforms, I think it is sufficient.

pyfisch commented 4 years ago

There may be another issue with key: modifiers affect the result (e.g. Shift + c = C). So does Ctrl + c emit c with the "control" modifier? For whatever reason, on X11, the current ReceivedCharacter(char) event emits U+03 instead, so making this interface work "as expected" might not be so easy... I don't know.

X11 keyboard handling sucks. What you want to use on X11 and Wayland is xkbcommon which does not have this and many other quirks. There are various rust bindings for it, but they are somewhat outdated/incomplete but with bindgen you can easily make your own binding.

kchibisov commented 4 years ago

There may be another issue with key: modifiers affect the result (e.g. Shift + c = C). So does Ctrl + c emit c with the "control" modifier? For whatever reason, on X11, the current ReceivedCharacter(char) event emits U+03 instead, so making this interface work "as expected" might not be so easy... I don't know.

X11 keyboard handling sucks. What you want to use on X11 and Wayland is xkbcommon which does not have this and many other quirks. There are various rust bindings for it, but they are somewhat outdated/incomplete but with bindgen you can easily make your own binding.

You'll get the same U+03 on Wayland with libxkbcommon too, that's what you should get, since you're asking for control chars. I'm not sure if you can tell libxkbcommon to prevent of doing so, but for example alacritty relies on that, and it's also the case for all(at least Windows/macOS/Wayland/X11) platforms to send control chars for control keys. It's not a bug or anything it's just what it is.

pyfisch commented 4 years ago

@kchibisov There are functions to get the symbol without the control transformation applied.

kchibisov commented 4 years ago

@kchibisov There are functions to get the symbol without the control transformation applied.

yeah, but you're expecting control chars most of the time, and I'd be surprised to not get them.

dhardy commented 4 years ago

Relevant on control chars: Wikipedia, XKB, xkb transformation without control translations

yeah, but you're expecting control chars most of the time, and I'd be surprised to not get them.

Are these standard enough to be reliably useful? Because the alternative is to use the API the way we're trying to design it: match Control modifier + c instead of U+03.

ArturKovacs commented 4 years ago

I think it should be possible to find out what the pressed key combination was. I'm not saying that it should be what the keyboard input event presents by default but there should at least be a function that allows the user to extract the layout-dependent key that was pressed together with the modifier. If this is too difficult to implement I could accept that this is not something that winit should offer.

kchibisov commented 4 years ago

Are these standard enough to be reliably useful?

Well, for something like terminal emulator, yeah, since if you suppress those you'll have to bind everything possible to those control chars youself, which is nearly impossible to do. You can match Control + c and suppress chars in your app without issues if you want just ctrl + c as a hotkey.

kchibisov commented 4 years ago

I think it should be possible to find out what the pressed key combination was. I'm not saying that it should be what the keyboard input event presents by default but there should at least be a function that allows the user to extract the layout-dependent key that was pressed together with the modifier. If this is too difficult to implement I could accept that this is not something that winit should offer.

I'm not sure what you're talking about, it's already possible in winit? Like you always know a modifiers state and a keys you're pressing. And modifiers are a special event. I'd just mention that you can't map scancode -> position, etc, since you can't assume qwerty. So the only thing that you can do to help downstream is to map scancode to Physical(A) or something like that. And have a virtual keys like we have right now. If you're planning to match by Received chars it could be tricky and you can't really design it hence compose keys, which could send strings of text in one press.

ArturKovacs commented 4 years ago

I'm not sure what you're talking about, it's already possible in winit?

See #1700 for a more detailed explanation of what I was referring to when I wrote "extract the layout-dependent key that was pressed together with the modifier".

dhardy commented 4 years ago

@ArturKovacs this relates to what we recently discussed: Control + Key gets mapped to some control char code, but it is also possible to get the unmapped version (for Linux platforms at least).

At this point it looks like a minimalist workable API would be to return the scancode (XKB keysym) and expose functions to map to unicode (with and without control char mapping) and to the key location.

kchibisov commented 4 years ago

At this point it looks like a minimalist workable API would be to return the scancode (XKB keysym) and expose functions to map to unicode (with and without control char mapping) and to the key location.

If we want to have something like that, we shouldn't expect clients to do so(it's way more complex than you may think), what we could do is to allow provide a per window configuration on how modifiers and such input should be handled.

kchibisov commented 4 years ago

I'm not even saying that you may have a control + certain key rule in xkb that will transform to Super + key on the fly, and you can't just ignore such things, meaning that you should always call to xkb with a proper keymap(which you can't pass to winit users, hence not cross platform), and use that control. As well as someone should handle compose key, right now it's a job of winit, if you expose xkb downstream how should handle composing?

ArturKovacs commented 4 years ago

I looked into the documentation of xkbcommon and it doesn't seem that complicated.

Do I misunderstand something or could this really be implemented as shown below?

Click to see the code

```rust struct KeyboardInput { // All platforms pub scancode: Scancode, pub modifiers: Modifiers, // *nix only layouts: Layouts, } // *nix implementation impl KeyboardInput { fn character_with_mods(&self) -> Option<&'static str> { let dep_mod = get_dep_mod(self.modifiers); let latched_mod = ... let locked_mod = ... let dep_layout = get_dep_layout(self.layouts); let latched_layout = ... let locked_layout = ... let xkb_key: xkb_keycode_t = get_xkb_keycode(self.scancode); unsafe { // note that this thread local xkb_state is only used for these transofmration function // and a separate xkb_state can be used in the event loop if needed. xkb_state_update_mask(thread_local_xkb_state, dep_mod, ...); let sym = xkb_state_key_get_one_sym(thread_local_xkb_state, xkb_key); // Look up thread local map of already known `&str`s for keysyms // If an &str is found, return that, otherwise continue... let size = xkb_state_key_get_utf8(thread_local_xkb_state, xkb_key, ptr::null_mut(), 0); if size == 0 { return None; } let mut output = Vec::::with_capacity(size); xkb_state_key_get_utf8(thread_local_xkb_state, xkb_key, output.as_mut_ptr(), output.capacity()); output.set_len(size); let utf8 = std::str::from_utf8_unchecked(output.into_boxed_slice().leak(); // Add `utf8` to thread local map of known keysyms Some(utf8) } } fn character_without_mods(&self) -> Option<&'static str> { let dep_mod = EMPTY MOD MASK; let latched_mod = EMPTY MOD MASK; let locked_mod = EMPTY MOD MASK; // and the rest is essentially identical } } ```

kchibisov commented 4 years ago

I looked into the documentation of xkbcommon and it doesn't seem that complicated.

I'm saying that clients shouldn't do libxkbcommon handing, we're already using it on Wayland https://github.com/Smithay/client-toolkit/blob/aac3c503242c8a2a9f37f4a2231e7b540e3a575c/src/seat/keyboard/mod.rs#L422.

ArturKovacs commented 4 years ago

I don't understand what you mean by that. Can you explain why code similar to what I wrote shouldn't be in winit?

kchibisov commented 4 years ago

I don't understand what you mean by that. Can you explain why code similar to what I wrote shouldn't be in winit?

I mean in winit clients, you're free to do that internally in winit if you want to.

dhardy commented 4 years ago

I don't mean expose the xkbcommon functions. I mean make safe wrappers which require only the scancode and the EventLoop or Window as context and return a char. The scancode can be whatever we want to make it and include the keysym, even a layout identifier if necessary.

ArturKovacs commented 4 years ago

In an attempt to move this to the implementation phase I tried to gather all suggestions described so far and I compiled an API which seems to represent the most reasonable compromise. Note that each struct may have implementation specific private fields added.

Please let me know if something seems wrong but otherwise I'd like to start implementing this for Windows in a few days.

I'm also willing to make Linux, macOS, and web implementations.

Note that the following code has been updated several times since posting this so some of the following reflections may be obsolete.

Click to see the API

```rust /// Key events are not reported at the beggining, during, and at the end of composition. // ------------------- // Developers' note: // This is due to the limitaiton that neither Firefox nor Chrome report keypresses correctly // during composition on Windows, so in order to maintain consistency, this behavior // is replicated on other platforms too. enum KeyboardEvent { Key(KeyEvent), Composition(CompositionEvent), } struct KeyEvent { scancode: ScanCode, physical_key: PhysicalKey, /// This value ignores all modifiers including /// but not limited to Shift, Caps Lock, /// and Ctrl. In most cases this means that the /// unicode character in the `Unicode` variant is lowercase. /// /// Note that this is `LogicalKey::Dead` for dead keys. /// /// Optimally this wouldn't be the case but unfortunately /// this is a limitation of the web API which is applied /// for every platform for consistency. logical_key: LogicalKey, /// This value is affected by all modifiers including but not /// limited to Shift, Ctrl, and Num Lock. /// /// Use this for text input along with `CompositionEvent`. /// /// Note that the `Unicode` variant may contain multiple characters. /// For example on Windows when pressing ^ using /// a US-International layout, this will be `Dead` for the first /// keypress and will be `Unicode("^^")` for the second keypress. /// It's important that this behaviour might be different on /// other platforms. For example Linux systems may emit a /// `Unicode("^")` on the second keypress. transformed_key: LogicalKey, key_state: ElementState, repeat: bool, } /// As described at https://www.w3.org/TR/uievents/#events-compositionevents enum CompositionEvent { CompositionStart(String), CompositionUpdate(String), CompositionEnd(String), } /// The layout-dependent key. /// /// This is identical to the label printed on the key when /// the currently active layout matches the layout of the /// labels on the keyboard. /// /// `Fn` and `FnLock` key events are not emmited by `winit`. /// These keys are usually handled at the hardware or at the OS level. #[non_exhaustive] enum LogicalKey { Unicode(&'static str), Ctrl(Location), Alt(Location), ... LeftArrow, RightArrow, ... F1, F2, ... /// Dead key. See https://www.w3.org/TR/uievents/#dead-key Dead, /// Reported when the label of the key cannot be determined. /// Note that this is distinct from the `Dead` variant. Unknown, } /// Represents the position of a key independent of the /// currently active layout. /// Synonymous with https://www.w3.org/TR/uievents-code/ /// /// `Fn` and `FnLock` key events are not emmited by `winit`. /// These keys are usually handled at the hardware or at the OS level. #[non_exhaustive] enum PhysicalKey { A, B, ... Digit0, Digit1, ... /// Reported when the position cannot be determined in a /// cross-platform manner. /// /// For example an on-screen keyboard or a remote control /// may not have a layout for which it's sensible to map /// the positions to other values of this enum. Unknown } #[non_exhaustive] enum Location { Standard, Left, Right, Numpad, } /// An opaque struct that uniquely identifies a single physical key on the /// current platform. /// /// This is distinct from `PhysicalKey` because this struct will always /// be a unique identifier for a specific key however `PhysicalKey` may be /// `Unknow` for multiple distinct keys. /// /// Furthermore this struct may store a value that cannot be ported /// to another platform, hence it is opaque. To retreive the underlying /// value, use one of the platform-dependent extension traits like /// `XkbScanCodeExt` #[derive(Copy, Clone, PartialEq, Eq, PartialOrd, Ord, Hash)] struct ScanCode { /// platform dependent private fields to uniquely identify a single key } /// For X11 (and maybe wayland as well?) impl XkbScanCodeExt for ScanCode { /// returns `xkb_keycode_t` fn keycode(&self) -> u32 { self.keycode } } ```

pyfisch commented 4 years ago

This is a very reasonable proposal.

However there is one particular issue I have a strong opinion about. That is the decision to ignore all modifier keys to for the LogicalKey::Unicode value. If the KeyboardEvent and CompositionEvent is correctly implemented on a platform (which is hard) you can reconstruct the input text from these two events (and don't need to rely on ReceivedChars or similar API). This is not possible if modifiers are ignored. An additional concern is that many applications offer single-key-shortcuts such as "[", "]", "?" etc. that require Shift or other modifier keys depending on the keyboard layout.

I think there should be an API to get the key value without modifiers, but it must not be the primary API.

I am happy you want to do some implentations. If you have any questions about Linux or the Web you can ping or email me.

ArturKovacs commented 4 years ago

If the KeyboardEvent and CompositionEvent is correctly implemented on a platform (which is hard) you can reconstruct the input text from these two events [...] I think there should be an API to get the key value without modifiers, but it must not be the primary API.

In my proposal the application may use CompositionEvent::Char for key input with modifiers and use LogicalKey for key input without modifiers. Furthermore every variant of the CompositionEvent is affected by modifier keys, meaning that composition_input is likely the only field one needs for text input.

An additional concern is that many applications offer single-key-shortcuts such as "[", "]", "?" etc. that require Shift or other modifier keys depending on the keyboard layout.

I think this depends on how the application wants to handle these cases.

One way to handle this would be to use the PhysicalKey if position is more important than the label.

Another way is to use the composition input API to look for such input.

if let Some(CompositionEvent::Char(ch)) = input_event.composition_input {
    if ch == '?' {
        // ? was pressed either directly or through a modified key
    }
}

maroider commented 4 years ago

If the KeyboardEvent and CompositionEvent is correctly implemented on a platform (which is hard) you can reconstruct the input text from these two events (and don't need to rely on ReceivedChars or similar API).

Could you elaborate on why "reconstructing the input text" instead of relying on platform APIs is desirable?

pyfisch commented 4 years ago

In my proposal the application may use CompositionEvent::Char for key input with modifiers and use LogicalKey for key input without modifiers. Furthermore every variant of the CompositionEvent is affected by modifier keys, meaning that composition_input is likely the only field one needs for text input.

This seems useful. I did miss CompositionEvent::Char. Although I still can't (easily) find out when a given symbol for which modifiers are needed is released.

An additional concern is that many applications offer single-key-shortcuts such as "[", "]", "?" etc. that require Shift or other modifier keys depending on the keyboard layout.

I think this depends on how the application wants to handle these cases.

One way to handle this would be to use the PhysicalKey if position is more important than the label.

Many applications have mnemonics, so they are about symbol, not location.

If the KeyboardEvent and CompositionEvent is correctly implemented on a platform (which is hard) you can reconstruct the input text from these two events (and don't need to rely on ReceivedChars or similar API).

Could you elaborate on why "reconstructing the input text" instead of relying on platform APIs is desirable?

If you are building a web browser. :wink: I proposed this interface originally for use in the servo browser, but since Mozilla discontinued this project it is not as important now. Besides browsers you will need it for high-quality GUI applications especially in the context of IME. How I understand the CompositionEvent API, its primary purpose is to be able to accurately construct the input text in a text field/word processor. If you decide you don't need this an API with just characters, key-down, key-up is likely sufficient for most games and simple applications.

maroider commented 4 years ago

This seems useful. I did miss CompositionEvent::Char. Although I still can't (easily) find out when a given symbol for which modifiers are needed is released.

It took me a moment to realize what you meant by this, but I think you mean that there's no easy way to tell that i.e. ! was released when Shift+1 is required to produce !.

Many applications have mnemonics, so they are about symbol, not location.

I'm beginning to think that a proper keyboard layout query API is required to support the greatest number of use-cases.

pub struct KeyboardLayout {
    layout: platform_impl::KeyboardLayout,
}

impl KeyboardLayout {
    pub fn logical_key(&self, physical_key: PhysicalKey, modifiers: ModifiersState) -> LogicalKey {
        self.layout.logical_key(physical_key, modifiers)
    }
}

impl PhysicalKey {
    pub fn to_logical(self, layout: &KeyboardLayout, modifiers: ModifiersState) -> LogicalKey {
        layout.logical_key(self, modifiers)
    }
}

I think Android, Linux (libxkbcommon) and Windows have good enough APIs to implement the minimal example above, and I think there's an experimental API for this in browser-land but I don't know if it's usable for the above example. I have absolutely no idea what the situation is on either macOS or iOS.

There would probably have to be a KeyboardLayoutChanged(KeyboardLayout) event somewhere, but I'm not sure where to stick it.

   /// This value ignores all modifiers like shift and ctrl, and
   /// it is always uppercase.
   Unicode(&'static str),

I'm not sure if converting to uppercase is the right thing to do here. Some characters don't round-trip losslessly through to_uppercase and to_lowercase. This might be doable if you use what the key would give you if Caps Lock is the only active modifier. Keys affected by Caps Lock have unambiguous mappings between the character emitted without Caps Lock and one emitted with Caps Lock on the layouts I regularly use, but there are a lot of different layouts out there, and one of them might not have this property.

ArturKovacs commented 4 years ago

This seems useful. I did miss CompositionEvent::Char. Although I still can't (easily) find out when a given symbol for which modifiers are needed is released.

Excuse me @pyfisch I completely missed this. Yeah, I think we can give a written guarantee in documentation that every CompositionEvent::Char will come with a Some(KeyboardEvent) from which you can tell wheter it's a press or release. This could maybe be expressed by the types themselves but I think any such design will result in pattern matching hell in the application.

   /// This value ignores all modifiers like shift and ctrl, and
   /// it is always uppercase.
   Unicode(&'static str),
I'm not sure if converting to uppercase is the right thing to do here. Some characters don't round-trip losslessly through to_uppercase and to_lowercase.

Yeah that uppercase thing I wasn't entirely sure about and I absolutely didn't know that the conversion is not lossless. Thanks for pointing that out! I'll remove the uppercase guarantee and update this to "This value ignores all modifiers including but not limited to shift, caps lock, and ctrl". Does that sound good?

maroider commented 4 years ago

Yeah that uppercase thing I wasn't entirely sure about and I absolutely didn't know that the conversion is not lossless. Thanks for pointing that out! I'll remove the uppercase guarantee and update this to "This value ignores all modifiers including but not limited to shift, caps lock, and ctrl". Does that sound good?

I think you need a couple more commas (and maybe the <kbd> tag), but yes. "This value ignores all modifiers including, but not limited to, Shift, Caps Lock, and Ctrl"

ArturKovacs commented 4 years ago

Thank you, I updated the comment adding the kbd tags as you suggested. I did not include the additional commas however because I'm confident that the current form is grammatically correct, and I think we should not get into a discussion about English grammar or writing style in this thread.

dhardy commented 4 years ago

I was thinking more like this. (I'm not certain if we need the layout identifier. I left out composition events which could be included as above.) Both KeySym and KeyboardLayout are probably just a u32 internally or even smaller.

/// Platform specific value identifying a key on a keyboard
#[derive(Clone, Debug, PartialEq, PartialOrd, Hash)]
struct KeySym { .. }

/// Platform specific value identifying the keyboard layout (from a list of available ones)
#[derive(Clone, Debug, PartialEq, PartialOrd, Hash)]
struct KeyboardLayout { .. }

struct KeyboardEvent {
    keysym: KeySym,
    layout: KeyboardLayout,
    key_state: ElementState,
    repeat: bool,
}

impl Window {
    fn get_layout_name(&self, KeyboardLayout) -> String;
    fn get_active_layout(&self) -> KeyboardLayout;

    /// Translate the key according to the layout, but disregarding CapsLock and modifier keys
    /// 
    /// If a unicode value is produced, it is usually lower-case.
    fn get_key_label(&self, keysym: KeySym, layout: KeyboardLayout) -> KeyLabel;

    /// Translate the key according to the layout and modifiers
    fn get_transformed_key(&self, keysym: KeySym, layout: KeyboardLayout) -> KeyLabel;

    /// Attempt to find the [`KeySym`] producing this label
    ///
    /// This is not guaranteed to return a result. On some platforms it may
    /// never return a result. In some cases it may arbitrarily choose one of
    /// multiple [`KeySym`]s producing this label.
    fn find_keysym(&self, label: &KeyLabel, layout: KeyboardLayout) -> Option<KeySym>;

    /// Attempt to find the [`KeyLocation`] corresponding to a [`KeySym`]
    fn get_key_location(&self, keysym: KeySym) -> Option<KeyLocation>;

    /// Attempt to find the [`KeySym`] corresponding to a [`KeyLocation`]
    fn find_keysym_by_location(&self, location: KeyLocation) -> Option<KeySym>;
}

#[non_exhaustive]
enum KeyLabel {
    // TODO: maybe this can be backed by [u8; LEN]
    Unicode(&'static str),

    Ctrl(Location),
    Alt(Location),
    ...
    LeftArrow,
    RightArrow,
    ...
    F1,
    F2,
    ...
}

enum Location {
    Standard,
    Left,
    Right,
    Numpad,
}

/// Identifies locations on a keyboard (relative to US Qwerty?)
#[non_exhaustive]
enum KeyLocation {
   ...
}

Edit: added KeyLocation

ArturKovacs commented 4 years ago

I don't see the reason for introducing the KeyboardLayout. Please describe the use-case for it in simple terms so I can understand it 😄

What is the difference between KeyLocation and KeySym? Why isn't KeyLocation enough?

dhardy commented 4 years ago

I don't see the reason for introducing the KeyboardLayout

In general, translation is dependent on the layout, and I think all major OSs allow convenient layout switching, so there's no guarantee that the active layout is the same one as when the app launched. This means that if we give the app an API for translating KeySym → KeyLabel that can be called later, the value may be wrong if we don't account for this. (As an alternative we could embed the layout within KeySym, though if we ever let apps actively switch layouts this will probably come back to haunt us.)

What is the difference between KeyLocation and KeySym? Why isn't KeyLocation enough?

Two things I guess. One is that KeyLocation is whatever enum we define; there's no guarantee that it will contain a unique value for every key on every keyboard so if we map the OS's identifier to this, then back again for translation to KeyLabel, the result may be lossy. Secondly, that's an extra translation step.

Note: KeySym is what I previously called scancode. This is vaguely modelled on the XKB API.

ArturKovacs commented 4 years ago

In general, translation is dependent on the layout [...]

Alright that's fair, but if I'm not mistaken the KeyboardLayout is not needed if we stick with the other proposal.

One is that KeyLocation is whatever enum we define [...] Secondly, that's an extra translation step.

I see.

These points are all compltely valid but only when using the architecture you proposed. However I'm not seening why your proposal is objectively better than the other one. I see two aspects where it's an improvement over the other one:

The translation from native keysyms into a KeyLabel is only done when the application truly needs it, so one can save a bit of processing time. Although I don't think this alone is good argument because the performance impact is negligible for something that happens as rarely as a keypress.
It's cleaner to get text input for non-composition events. This isn't a good enough argument either to switch to the API you are proposing because the other one can be slightly tweaked to match this behaviour. Namely by removing the Char variant of the CompositionEvent and adding a character field to the KeyboardEvent.

At the same time it adds the burden of having to keep track of the KeyboardLayout which of course is not a burden if it has a use that I'm not seeing right now.

dhardy commented 4 years ago

Well, this API is certainly less simple to use, but its advantages are a little more extensive:

We have two desirable translations from native keysyms to labels/values: including modifiers and excluding them. As @kchibisov said above, you ideally want both (so your API should have two versions of logical_key):

Well, for something like terminal emulator, yeah, since if you suppress those you'll have to bind everything possible to those control chars youself, which is nearly impossible to do. You can match Control + c and suppress chars in your app without issues if you want just ctrl + c as a hotkey.
Translations to/from various formats may not be available on all platforms, or may not be easy to write immediately. My API allows partial compliance (by returning None in various functions). Of course this may be a pain to deal with at the application level, but it avoids having to fudge too much (if e.g. your on-screen keyboard won't give you a physical_key).

ArturKovacs commented 4 years ago

We have two desirable translations from native keysyms to labels/values: including modifiers and excluding them.

I completely agree and both proposals solve this problem. Yours have 'label' and 'transformed_key', and the other has 'logical_key' and 'Char'. Yours may be a bit cleaner by using the same type for 'label' and 'transformed_key' but again this is something that can be adopted by the other API.

Translations to/from various formats may not be available on all platforms, or may not be easy to write immediately. My API allows partial compliance [...]

This is the strongest argument in my opinion. In fact this convinced me that your proposal is objectively better. The only thing I'm still a bit worried about is whether an implementation exists for this API for the minimal required featureset on all platforms, but I guess the only way to find out is to try implementing it.

ArturKovacs commented 4 years ago

Hold up a minute. After giving this more thought I'm back on the fence. The only functions you proposed returning Options are

fn find_keysym(&self, label: &KeyLabel, layout: KeyboardLayout) -> Option<KeySym>;
fn get_key_location(&self, keysym: KeySym) -> Option<KeyLocation>;
fn find_keysym_by_location(&self, location: KeyLocation) -> Option<KeySym>;

But again, it seems to me that these are not required if we choose the other API. Is there a use-case not covered by the other API where these are needed?

ArturKovacs commented 4 years ago

Yet again I updated my proposal to contain more documentation and also changed how text input is reported inspired by @dhardy's latest proposal.

dhardy commented 4 years ago

Do we need the transformed (character) output in addition to CompositionEvent? I'm still not really clear on how that API works. Does it return CompositionEnd(text) for all text input?

Your updated proposal looks adequate, I guess. I'm not sure whether KeyLabel and KeyPosition will need None/Unknown variants to handle discrepancies between platforms and input devices.

maroider commented 4 years ago

e.g. your on-screen keyboard won't give you a physical_key

Do on-screen keyboards ever not imitate real keyboards? Windows' built-in on-screen keyboard looks like it's indistinguishable from a regular keyboard (if you ignore the window focus stuff).

Additionally, some keys like my keyboard's dedicated ⏹, ⏮️, ⏯️, and ⏭️ keys don't emit a unique scancode and instead give you 0 on Windows. I'm not sure if you could "fake" a KeyPosition for these keys since I don't know exactly how special their behaviour is. uievents-code would have you believe that this is reasonable to do, so there might be a way to get this to work on all platforms.

Instead of a transformed Option<&'static str>, there should just be a transformed KeyLabel. This way, you can get at the second layer of the numpad, which would otherwise be inaccessible with the current API.

The modifier-independent KeyLabel should probably be a Option<KeyLabel> for now since the web API this would depend on seems to only be an early draft and is implemented only in Chrome and Chromium-derivatives (and partially at that).

I guess. I'm not sure whether KeyLabel and KeyPosition will need None/Unknown variants to handle discrepancies between platforms and input devices.

I think it's a good idea to do this. Such variants should also contain platform-specific values which allow you to somewhat uniquely identify the keypress, at least for KeyPosition. This would be somewhat in line with some games I've played (can't remeber which ones) and Discord (Discord lets me use F20-24 as keybinds, but displays it as "UNK131-135").

maroider commented 4 years ago

Bikeshed: I really prefer PhysicalKey/LogicalKey over KeyPosition/KeyLabel. It's not that important, but I think those names are more in line with the Physical/Logical split in the dpi module. Keyboard input is different from dpi, but I feel like it's similar enough for the analogy to make some sense.

ArturKovacs commented 4 years ago

Do we need the transformed (character) output in addition to CompositionEvent?

Unfortunately we do. I didn't realize this earlier myself but as @maroider pointed out, the transformed input has to be able to represent non-printable keys like Insert and Delete due to NumLock shenanigans.

Furthermore I updated the code so that a documentation comment hopefully clears up when a CompositionEvent is triggered.

I'm not sure whether KeyLabel and KeyPosition will need None/Unknown variants to handle discrepancies between platforms and input devices.

You are right, definitely. I added those too.

Do on-screen keyboards ever not imitate real keyboards?

There is no guarantee they do imitate real keyboards. Even if they do, they might contain keys that cannot be sensibly mapped to real keyboard positions which should report Unknown positions in my opinion.

Additionally, some keys like my keyboard's dedicated ⏹, ⏮️, ⏯️, and ⏭️ keys don't emit a unique scancode and instead give you 0 on Windows.

Exactly. I don't think that winit should somehow try to come up with a position for those keys. It should just be Unknown.

Instead of a transformed Option<&'static str>, there should just be a transformed KeyLabel. This way, you can get at the second layer of the numpad, which would otherwise be inaccessible with the current API.

Thanks again for pointing that out. I updated the API according to this.

The modifier-independent KeyLabel should probably be a Option for now since the web API this would depend on seems to only be an early draft and is implemented only in Chrome and Chromium-derivatives (and partially at that).

Hmm that is unfortunate indeed. Although I think this can be handled relatively gracefuly until that API gains a more widespread support. Instead of making the logical_key an Option, we could check if the key in the keydown event is lowercase and if it is, use that. Otherwise check if it's uppercase if it is, call to_lowercase on it, and use that. If it's neither report Unknown. This would at least allow implementing the most common shortcuts which in my view is the primary reason we have the logical_key field.

Bikeshed: I really prefer PhysicalKey/LogicalKey over KeyPosition/KeyLabel

Not a problem for me. In the updated version I renamed them like this.

maroider commented 4 years ago

Instead of making the logical_key an Option, we could check if the key in the keydown event is lowercase and if it is, use that. Otherwise check if it's uppercase if it is, call to_lowercase on it, and use that. If it's neither report Unknown.

If it's implemented this way, then it should be documented very clearly.

maroider commented 3 years ago

Instead of a transformed Option<&'static str>, there should just be a transformed KeyLabel. This way, you can get at the second layer of the numpad, which would otherwise be inaccessible with the current API.

Thanks again for pointing that out. I updated the API according to this.

Alas, I've led you slightly astray on this one. The layer I was worried about is the base layer, which would still be accessible as logical_key. The second layer (accessible with Num Lock on) is the one with numeric inputs. I usually have Num Lock on, so that's why I got myself confused. I think LogicalKey is more semantically correct (and can handle more peculiar layouts), but Option<&'static str> would have worked with most layouts.

   /// Note that the `Unicode` variant may contain multiple characters.
   /// For example when pressing <kbd>^</kbd> using a US-International
   /// layout, this will be `Dead` for the first keypress and will be
   /// `Unicode("^^")` for the second keypress.

I'm not sure if this is how dead keys behave on every platform. It's been my experience that pressing ^ twice on Linux will give me a single "^". My layout isn't "US-International", but I doubt that's the issue. In either case, since this is a "dead key thing", it should be handled in CompositionEvent. You could potentially get away with LogicalKey::Unicode("^") (on the first keypress) here by cheating a little on the Web backend and wait for the first compositionupdate which will (hopefully) reveal which dead key was pressed, since each dead key ought to produce a unique combining character. The modifier-independent value will still have to make do with LogicalKey::Dead though. The way a text editor would have to handle this would be to ignore transformed_key when there's a CompositionEvent, since dead keys ought to fire CompositionEvents.

Other than that, the only thing I have issues with is the shape of the text input part of this API. It feels subtly wrong, but I can't seem to figure out a better way to do it.

dhardy commented 3 years ago

It does seem clunky. I'd still like to see a tabulation of what data is available. Something like:

scancode (maybe)
physical location (or unknown)
unicode with modifier transforms (utf-8, may have control chars, we can ignore anything that doesn't map to unicode); may have length 0; max length unknown in general?
either:
- "command" key (arrows, F#, Ctrl, MediaPlay, ...) — this may overlap with above
- unicode, without modifier transforms (usually lower case)
compose buffer

Meanwhile we could categorise input as:

press, repeat (held), or release
compose start, update or end

Optimisations and redundancies:

"'Compose start" is always a press event?
Release events don't need to include most of the above data so long as they can be matched against press events; currently this is done via scancode
Maybe we can attach a lifetime to WindowEvent and use &'a str instead of String
Maybe we could make KeyRelease a separate event type and only include the physical location? No, since this may be Unknown.
Any more? I don't see any, despite four simultaneous ways of representing input above.

I still think there's reason to consider my scancode/keysym API instead of @ArturKovacs's; it has a smaller message size and avoids having to translate to all types of input for both press and release.

Also, my experience with KAS is that one might have complex rules to determine how to handle a press, but handling release is normally a simple dictionary-removal using the scancode as a unique key. Without the scancode there is no unique key. Alternatively we could go with @ArturKovacs's API but add scancode, then (maybe) use a different event type with only the scancode for KeyRelease.

dhardy commented 3 years ago

It appears that I have been terribly naive in assuming we can simply use a scancode and translate to a representation of our choice. Win32 uses a "virtual-key code" and a "scan code" and sometimes requires both plus further state for translations. Keyboard input structs have been extended for non-keyboard input and so may lack a scancode, however in this case we might choose to deliver only text input and not key input.

Also, Win32 doesn't appear to have any way of differentiating "physical location" and "key labels according to the current layout"; it simply has a Virtual-Key Code (whose value presumably depends on the current layout). We might be able to get around this by loading a specific layout such as standard US English and using this for translation in addition to the active layout? Or we could attempt translation from scan-codes, though I believe those are device dependent so probably not viable.

This makes it difficult to do better than the current API.

Are we even able to properly associate key, character and IME input? The composition-event branch lists all three as separate events.

ArturKovacs commented 3 years ago

For convenience here's one more link to my updated proposal.

I'm not sure if this is how dead keys behave on every platform. It's been my experience that pressing ^ twice on Linux will give me a single "^"

I didn't know this. Alright... I think this should match the platform specific behaviour then as I would certainly expect all applications to behave similarly to eachother on one platform.

In either case, since this is a "dead key thing", it should be handled in CompositionEvent.

That was my thought as well but when I tested it with Firefox on Windows the Javascript API's key field only contained Dead on the first keypress and the isComposing field was set to false.

You could potentially get away with LogicalKey::Unicode("^") (on the first keypress) here by cheating a little on the Web backend and wait for the first compositionupdate

Again this is not treated as a composition event at least on the Web but even then I don't think this is a good idea because we want to let the applications know about at least the physical aspects of keypress events as soon they happen. But then it becomes difficlut to conviniently communicate the relationship between the physical keypress and the composition event. Although admitedly that aspect is alreaady not perfect in my current proposal because the CompositionEnd may be detached from the physical keypress.

I'd still like to see a tabulation of what data is available.

The tabulation you just made there seems accurate to me. To answer a few questions there: I'm definitely not against exposing the platform specific scancode through, say an Ext trait. Otherwise the physical key is the platform independent representation of the scancode which I think you know I didn't want to second guess.

I don't have a strong opinion about whether the Unicode variant should be allowed to have a 0 length. If you think it's benefitial to provide a guarantee regarding whether it can be empty, I would say that there is no problem with that. We can just convert empty strings coming form the OS to unknow at the implementation side given that it's not a dead key input.

The max length of the unicode variant of the transformed in input is unknown as far as I can tell. It's up to the platform's implementation how they want to present that aspect of text input to the applications so it can be any positive length yeah.

"'Compose start" is always a press event?

Yes (unless we find out during implementation that this cannot be guaranteed).

Release events don't need to include most of the above data so long as they can be matched against press events; currently this is done via scancode

I think there must be a balance between avoiding redundancy and ease of use. In my opinion sending all the information once more together with the key release does not tip this balance. Could you show a specific example or otherwise explain where this redundancy is undesirable?

Maybe we can attach a lifetime to WindowEvent and use &'a str instead of String

I don't see what would be the argument for using reference here. Even if you have to make an allocation when taking the string from the OS, the performance impact of such allocations are fully negligable. Unless this is proven otherwise I don't think we should consider introducing non-static lifetimes into this interface.

I don't see any, despite four simultaneous ways of representing input above

I'm not sure what you are referring to here. Is it the press, repeat, release, and composition?

I still think there's reason to consider my scancode/keysym API instead of @ArturKovacs's; it has a smaller message size and avoids having to translate to all types of input for both press and release.

With an estimate favoring your argument, it saves let's say 30 bytes of memory when a smartwatch has at least a million times that much memory. It also saves maybe a few microseconds from a function that get's called once every 50.000 microseconds if the user is typing at 20 keystrokes a second which is faster than the fastest recorded typing speed according to my calculations.

So with differences this small I think that lower memory and better performance are not valid arguments for picking a particlar API.

Without the scancode there is no unique key.

I see. I didn't know about this use case before so I just added the scancode to my proposal.

dhardy commented 3 years ago

I'm not sure what you are referring to here. Is it the press, repeat, release, and composition?

Physical location, label (unshifted translation by current layout), translated (unicode + control chars), IME.

Of these, physical location and label may be the same thing (some type of VirtualKeyCode), but with the first using a fixed layout (US) and the latter using the active layout.

Translated input and IME input are roughly the same except that the former may include control chars and the latter may be delayed (and may be passed during multiple edit states).

maroider commented 3 years ago

It appears that I have been terribly naive in assuming we can simply use a scancode and translate to a representation of our choice. Win32 uses a "virtual-key code" and a "scan code" and sometimes requires both plus further state for translations.

Eh, you might be able to get away with using MapVirtualKeyW or MapVirtualKeyExW with MAPVK_VSC_TO_VK_EX and the scancode to get the corresponding vkey. Not sure if this would match some of the "interesting" quirks with the scancode+vkey combinations you get directly.

Also, Win32 doesn't appear to have any way of differentiating "physical location" and "key labels according to the current layout"; it simply has a Virtual-Key Code (whose value presumably depends on the current layout).

You have to go out of your way to get this information. My understanding is that the non-alphanumeric keys can't be changed much (if at all) from one layout to another.

Keyboard input structs have been extended for non-keyboard input and so may lack a scancode, however in this case we might choose to deliver only text input and not key input.

Good catch. PhysicalKey::Unknown could also work here.

Also, Win32 doesn't appear to have any way of differentiating "physical location" and "key labels according to the current layout"; it simply has a Virtual-Key Code (whose value presumably depends on the current layout). We might be able to get around this by loading a specific layout such as standard US English and using this for translation in addition to the active layout?

Yeah, there's no native solution for this. You'd have to load the current keyboard layout, say before every Event::NewEvents, since I don't think Windows notifies you that the layout has changed. There's probably also a case to be made for loading the keyboard layout on every keyboard event, since you can change the layout with a keyboard shortcut (Win+Space bar). You'd then have to check the vkey to see if its a functional key, control pad key, arrow key, numpad key, function key, media key or backspace. If it's one of those, then I think you don't have to look further. For the other keys, you might have to use ToAsciiEx or ToUnicodeEx to get the value that's mean to be produced by a keypress + some set of modifiers.

Or we could attempt translation from scan-codes, though I believe those are device dependent so probably not viable.

From "Keyboard Scan Code Specification":

Under all Microsoft operating systems, all keyboards actually transmit Scan Code Set 2 values down the wire from the keyboard to the keyboard port. These values are translated to Scan Code Set 1 by the i8042 port chip.1 The rest of the operating system, and all applications that handle scan codes expect the values to be from Scan Code Set 1. Scan Code Set 3 is not used or required for operation of Microsoft operating systems.

While that document is from the year 2000, it still seems to be the case today that the scancodes you get from Windows are (mostly) from "PS/2 Scan Code Set 1". They are also stable enough that several notable games use them for keybinds. Unfortunately, Windows doesn't emit non-zero scancodes for certain keys, so you can't have every physical key be represented by a scancode. Some of these keys shouldn't be able to be re-mapped in any way, though (outside of gaming keyboard shenanigans), so you might just get away with using the vkey to retrieve physical location for some those keys.

Are we even able to properly associate key, character and IME input? The composition-event branch lists all three as separate events.

Now that's something I truly don't know.

maroider commented 3 years ago

In either case, since this is a "dead key thing", it should be handled in CompositionEvent.

That was my thought as well but when I tested it with Firefox on Windows the Javascript API's key field only contained Dead on the first keypress and the isComposing field was set to false.

You could potentially get away with LogicalKey::Unicode("^") (on the first keypress) here by cheating a little on the Web backend and wait for the first compositionupdate

Again this is not treated as a composition event at least on the Web but even then I don't think this is a good idea because we want to let the applications know about at least the physical aspects of keypress events as soon they happen. But then it becomes difficlut to conviniently communicate the relationship between the physical keypress and the composition event. Although admitedly that aspect is alreaady not perfect in my current proposal because the CompositionEnd may be detached from the physical keypress.

Pressing ^ should fire composition events immediately after the keydown event, unless I've misunderstood Example 26 in the uievents specification. It may, however, be challenging to associate the compositionupdate event with the keydown event.

dhardy commented 3 years ago

Thanks for the response @maroider. This seems to indicate that we could omit physical_location from the results and use a function to try mapping (scancode, vkey) to physical_location as well as physical_location → scancode (with both functions returning an Option).

Although that's only viable if all significant platforms function roughly this way.

Eh, you might be able to get away with using MapVirtualKeyW or MapVirtualKeyExW with MAPVK_VSC_TO_VK_EX

Not a good idea since scancode may be 0.

maroider commented 3 years ago

This seems to indicate that we could omit physical_location from the results and use a function to try mapping (scancode, vkey) to physical_location as well as physical_location → scancode (with both functions returning an Option).

Although that's only viable if all significant platforms function roughly this way.

It will likely work on Linux and Windows. The web backend will likely be tricky, challenging or impossible to implement properly. What I've been able to gather from the macOS documentation suggests that it might be possible to implement this. iOS seems to have a clear separation between on-screen and physical keyboards, and what you've just described might be possible to implement. Android also has a clear separation between physical and on-screen keyboards, but Android's documentation also explicitly unifies on-screen keyboards and IMEs. What you've described should also be possible to implement here.

With the above in mind, should mobile on-screen keyboard input be treated as IME input? Fully adapting the mobile APIs for text input will likely require some additions to the IME API later down the road, but implementing whatever is decided upon here would be a huge improvement over the current state (which is essentially unimplemented).

Not a good idea since scancode may be 0.

I can't believe I forgot that.

EDIT: After rereading this comment, I feel like we need a more complete overview of what's available in what form on each platform. I've got a very incomplete document that's kind-of-sort-of that, but it needs more work.

rust-windowing / winit

A keyboard input model #753

Shortcomings

Proposed Solution

Implementation