rust-windowing / winit

Window handling library in pure Rust
https://docs.rs/winit/
Apache License 2.0
4.74k stars 891 forks source link

A keyboard input model #753

Closed pyfisch closed 1 year ago

pyfisch commented 5 years ago

TLDR: I think that Winit needs more expressive keyboards events and to follow a written specification to keep platform inconsistencies to a minimum. I propose to adapt the JS KeyboardEvent for winit and to follow the UI Events specification for keyboard input.

Winit is used for many applications that need to handle different kinds of keyboard input.

Currently there are two events for text input in Winit: KeyboardInput and ReceivedCharacter.

pub struct KeyboardInput {
    pub scancode: ScanCode,
    pub state: ElementState,
    pub virtual_keycode: Option<VirtualKeyCode>,
    pub modifiers: ModifiersState,
}

The KeyboardInput event carries information about keys pressed and released. scancode is a platform-dependent code identifying the physical key. virtual_keycode optionally describes the meaning of the key. It indicates ASCII letters, some punctuation and some function keys. modifiers tells if the Shift, Control, Alt and Logo keys are currently pressed.

The ReceivedCharacter event sends a single Unicode codepoint. The character can be pushed to the end of a string and if this is done for all events the user will see the text they intended to enter.

Shortcomings

This is my personal list in no particular order.

  1. List of VirtualKeyCode is seen as incomplete (#71, #59). Without a given list it is hard to decide which keys to include and when the list is complete. Also it is necessary to define each virtual key code so multiple platforms will map keys to the same virtual key codes. While it probably uncontroversial that ASCII keys should be included for non-ASCII single keys found on many keyboards like é, µ, or ü it is more difficult to decide and to create an exhaustive list.
  2. While VirtualKeyCode should capture the meaning of the key there are different codes for e.g. "0": Key0 and Numpad0 or LControl and RControl.
  3. The ScanCode is platform dependent. Therefore apps wanting to use keys like WASD for navigation will assume an QWERTY layout instead of using the key locations.
  4. It is unclear if a key is repeated or not. Some applications only want to act on the first keypress and ignore all following repeated keys. Right now these applications need to do extra tracking and are probably not correct if the keyboard focus changes while a key is held down. (#310)
  5. A few useful modfiers like AltGraph and NumLock are missing.
  6. There is no relation between ReceivedCharacter and KeyboardInput events. While this is not necessary for every application some (like browsers) need it and have to use ugly (and incorrect) work-arounds. (#34)
  7. Dead-key handling is unspecified and IMEs (Input Method Editors) are not supported.

In general there are many issues that are platform-dependant and where it is unclear what the correct behavior is or it is not documented. Both alacritty and Servo just to name two applications have multiple issues where people mention that keyboard input does not work as expeced.

Proposed Solution

Winit is not the first software that needs to deal with keyboard input on a variety of platforms. In particular the web platform has a complete specification how keyboard events should behave which is implemented on all platforms that Winit aims to support.

While the specification talks about JS objects it can be easily ported to Rust. Some information is duplicated in KeyboardEvent for backwards compatibility but this can be omitted in Rust so Winit stays simpler.

See the keyboard-types for how keyboard events can look like in Rust.

Implementation

This is obviously a breaking change so there needs to be a new release of winit and release notes. While the proposed events are very expressive it is possible to convert Winit to the new events first and then improve each backend to emit the additional information about key-codes, locations, repeating keys etc.

Thank you for writing and maintaining Winit! I hope this helps to get a discussion about keyboard input handling started and maybe some ideas or even the whole proposal is implemented in Winit.

Osspial commented 5 years ago

Hi, and thanks for taking the time to put this together! Overall, I like the direction this is going, but there are some specific feedback points that come up for this.

VirtualKeyCode is replaced with a Key. This is an enum with all the values for functional keys and a variant for Unicode values that stores printable characters both from the whole Unicode range.

Being more general on this would be a good change. I don't like using a full String for this, though - it introduces various issues that I'm not particularly happy with:

Unfortunately I can't think of a good replacement that's as flexible as a string while accounting for both of those issues, but it's something that rubs me the wrong way.

There is just one value for keys like "Control" but if necessary one can distinguish left/right or keyboard/numpad keys by their location attribute.

I like the idea of having a left/right enum to distinguish between sided keys. However, the Location enum should be exposed through variants in the Key enum (e.g. Ctrl(Location)), rather than on the main KeyboardEvent struct we expose.

ScanCode is complemented by Code. Codes describe physical key locations in a cross-platform way. Specification

Making scan codes platform-independent is certainly something we should do, although the W3C Code specification relies a bit too much on the layout of the US keyboard for my liking. Perhaps we should use some sort of numeric index for this? I feel we should also remove ScanCode support entirely, since it doesn't seem to provide any real use for cross-platform application programming. I'd be open to a counter-example, though.

Whatever mechanism we decide on, there should be some method for translating between Codes and Keys, for display purposes.

(shortcoming 5) All known modifier keys are supported. Note: W3C decided to include some keys that are usually handled in hardware and don't emit keyboard events (like Fn, FnLock)

I'd like to leave the hardware-handled keys out of our "officially supported" keys, but this would be a good change. We may also want to create a separate ModifiersChanged event, but that needs discussion and I'm not entirely sure it's the right move.

(shortcoming 4) a repeat attribute is added. (shortcoming 6) received characters and keyboard events are now one (exceptions see below) (shortcoming 7) to handle dead keys and IMEs a composition event is introduced. It describes the text that should be added at the current cursor position.

I'm down with all of these changes.

pyfisch commented 5 years ago

Hi, thanks for taking the time to review this!

Being more general on this would be a good change. I don't like using a full String for this, though - it introduces various issues that I'm not particularly happy with:

  • A full String is more difficult to match on than an enum, str, or char.

  • Unicode characters have multiple cases, while keyboard keys only have one case. This could introduce some tricky bugs into people's applications.

Unfortunately I can't think of a good replacement that's as flexible as a string while accounting for both of those issues, but it's something that rubs me the wrong way.

I couldn't agree more and would have preferred to use char instead. Matching is easy and the Key enum can implement Copy. The reason to use a String is that a key string has a base character and 0 or more combining characters because certain languages have keys that can't be represented with a single code point. (The problem with &str is that someone needs to own it and an enum is problematic because someone needs to decide which characters exist ahead of time and extend it for each new Unicode version.)

Because matching strings is so painful I wrote the ShortcutMatcher which is used by Servo. It is a quite convenient way to match keys and shortcuts. (Btw it ignores ASCII case and handles some other quirks)

Unicode characters have multiple cases, while keyboard keys only have one case.

One way to think about keyboard keys is that they have multiple levels. For example the "M" key on my keyboard has four levels that are accessed with different modifier keys: "m", "M", "µ", "º". These values should be a different Key. On the other hand, while the current VirtualKeyCodes can be typed without modifiers on a US-ASCII keyboard (I think) some variants like LBracket can only be accessed with modifier keys (in my case AltGr+8) on other keyboards. For these reasons I think it is preferable to have different Unicode cases in key values.

I like the idea of having a left/right enum to distinguish between sided keys. However, the Location enum should be exposed through variants in the Key enum (e.g. Ctrl(Location)), rather than on the main KeyboardEvent struct we expose.

Is there a specific reason to do it this way? If there is ever a need to add a location to a key that previously did not have one (e.g. Backspace on num pad) this would be a breaking change.

Making scan codes platform-independent is certainly something we should do, although the W3C Code specification relies a bit too much on the layout of the US keyboard for my liking. Perhaps we should use some sort of numeric index for this?

One upside of using names referring to the US keyboard layout is that this layout is already familiar to a lot of people and there are plenty of diagrams and photos of the layout for quick reference. Classic scancodes are too short (8-bit) and vary between keyboards from different manufacturers. One language independent index used by X11 can be seen below. (search for X11 keycode names)

I'd like to leave the hardware-handled keys out of our "officially supported" keys

Yeah, there should be a list of supported modifiers for each platform in the docs.

We may also want to create a separate ModifiersChanged event, but that needs discussion and I'm not entirely sure it's the right move.

I am not sure when I would use ModifiersChanged event as modifier keys already send keydown and keyup events.

Osspial commented 5 years ago

The problem with &str is that someone needs to own it and an enum is problematic because someone needs to decide which characters exist ahead of time and extend it for each new Unicode version.

There's a solution for using &str, actually - we could convert unicode Strings that are constructed at runtime into &'static strs as follows, then we can internally store a cache of keypress strings so that we don't consume additional memory for every keypress:

let string: String = "Hello".to_string();
// Construct a 'static string at runtime.
let x: &'static str = Box::leak(string.into_boxed_str());

That would let us pass &strs through the unicode variant and let people use string matching.

For example the "M" key on my keyboard has four levels that are accessed with different modifier keys: "m", "M", "µ", "º". These values should be a different Key. On the other hand, while the current VirtualKeyCodes can be typed without modifiers on a US-ASCII keyboard (I think) some variants like LBracket can only be accessed with modifier keys (in my case AltGr+8) on other keyboards. For these reasons I think it is preferable to have different Unicode cases in key values.

The purpose of having Key-codes is to let the program figure out which keys have been pressed irrespective of any modifier-key presses - we'd want all of those characters to always be exposed under one key, since they're mapped to the same key. If you want to access the character that's outputted, taking into account modifier keys, you check the received character.

Is there a specific reason to do it this way? If there is ever a need to add a location to a key that previously did not have one (e.g. Backspace on num pad) this would be a breaking change.

Mainly, to make matching more ergonomic. If you wanted to match on both location and key with the types being separate, you'd have to do this:

match (key, location) {
    (Key::A, _) => (),
    (Key::B, _) => (),
    (Key::C, _) => (),
    (Key::Alt, _) => (),
    (Key::Ctrl, Location::Left) => (),
    (Key::Ctrl, Location::Right) => (),
    _ => ()
}

With them combined into one type, it looks like this:

match key {
    Key::A => (),
    Key::B => (),
    Key::C => (),
    Key::Alt(_) => (),
    Key::Ctrl(Location::Left) => (),
    Key::Ctrl(Location::Right) => (),
    _ => ()
}

The second version is nicer to read, and it also lets the reader know when a key's specific location is being ignored, versus when a key only has one possible location. The first version doesn't communicate that information.

Regarding adding a location to an existing key being a breaking change - there shouldn't be any reason we ever have to do that! Keyboard layouts are fairly static, and only a limited subset of keys are going to have multiple locations on the keyboard. We should be able to keep track of which ones have multiple locations and structure the enum as necessary.

One upside of using names referring to the US keyboard layout is that this layout is already familiar to a lot of people and there are plenty of diagrams and photos of the layout for quick reference. Classic scancodes are too short (8-bit) and vary between keyboards from different manufacturers. One language independent index used by X11 can be seen below. (search for X11 keycode names)

My main issue with using the QWERTY keys to specify a layout-independent keymap feels against the spirit of providing such an API. Something in the vein of that X11 index seems like a decent solution, though.

I am not sure when I would use ModifiersChanged event as modifier keys already send keydown and keyup events.

If users could always keep track of which modifier keys have been pressed with keydown and keyup events, we wouldn't need to expose a modifiers parameter at all. The reason we expose them is because if someone presses a modifier key outside of the window then focuses the window, or presses the modifier key inside the window and unfocuses the window, the key-down/key-up events won't be properly delivered.

The reason I was floating a separate ModifiersChanged event was so that we wouldn't have to expose a modifiers variable alongside pretty much every window event, as we do now. However, I realize now that it would be simpler from a user's standpoint to provide stronger guarantees about keypress events so that they can reliably keep track of which keys have been pressed without running into the pitfalls described above (such as, guaranteeing to deliver a KeyUp event for every KeyDown event or automatically sending KeyDown events for all pressed keys when a user focuses the window).

Osspial commented 5 years ago

Actually, regarding device-dependent virtual key-codes - what real purpose do they provide that isn't provided by exposing the received character and the device-independent key code? I can't think of a reason for using the virtual key-codes that isn't better-served by one of the other two methods; keyboard mappings should generally be done with the device-independent keys, and character input is best done with received character events.

pyfisch commented 5 years ago

The UI Event Specification explains how keyboards work. It discusses why each part of the event is useful and how they relate to each other.

pyfisch commented 5 years ago

The purpose of having Key-codes is to let the program figure out which keys have been pressed irrespective of any modifier-key presses

Looks like we are talking about different things then. You seem to associate the visual markings on the key cap with key codes. While the UI Events specification and I refer to the functional mapping of the key.

If you want to access the character that's outputted, taking into account modifier keys, you check the received character.

What I propose is that the character that's outputted is the key. Received character is then redundant.

To match with separate key and location you can do this:

match event.key {
    Key::Home => ...
    Key::End => ...
    Key::Control if event.location == Left => ...
    Key::Control if event.location == Right => ...
    _ => ...
}

If a user does not care about key locations they don't have to know they exist at all. On the other hand if key and location are one type every user needs to know (or be told by the compiler) which keys have multiple locations to write Key::Control(_i_dont_care). (I expect this to be the common case.)

Osspial commented 5 years ago

Looks like we are talking about different things then. You seem to associate the visual markings on the key cap with key codes. While the UI Events specification and I refer to the functional mapping of the key.

Not quite - if the user has switched their keyboard layout away from what's printed on their keyboard (say, to Dvorak) the key code would correspond to the remapped keybindings. Otherwise that seems fairly accurate.

What I propose is that the character that's outputted is the key. Received character is then redundant.

So, following the UI Events specification would have us mix character input and other keypresses (ctrl, alt, arrow keys, etc.) into a single API, right? I really don't like the idea of doing that. Having that API in addition to the physical key-press and character composition APIs leads to a situation where there's a lot of overlap for what each API does:

The functional key-press API doesn't have its own specific purpose: sometimes it does things the physical keypress API does, and because it handles the majority of unicode input it make the character composition API easy to ignore.

I'd rather only have two keyboard input APIs:

Under this design, the purpose of each API is much more clear: the physical keypress API handles mapping each key to a function, and the character input API handles... well, all character input. Skimming through the UI Events spec it seems like it would be possible to map this API onto that, as well.

On the other hand if key and location are one type every user needs to know (or be told by the compiler) which keys have multiple locations to write Key::Control(_i_dont_care).

That's the point of merging those two events - to force users to decide whether they care or not. Whether you like that is up to personal preference, I guess; I like it because it improves the readability of the code (you know when someone's opting out of considering location vs. when there's no location to consider) and the documentation (we don't have to manually specify which keys have locations - if a key has a location, it's inherent to the declaration of the variant).

pyfisch commented 5 years ago

I'd rather only have two keyboard input APIs:

  • Physical key-press API (layout-agnostic keypresses)
  • Character input API (handles unicode characters and composition events)

Fine. How do you handle keyboard shortcuts like Control+Z (for undo)? Keep in mind that the placement of the Z key varies across common layouts and reasonable people may move the functionality of the Control key to another physical key.

Osspial commented 5 years ago

How do you handle keyboard shortcuts like Control+Z (for undo)?

I... hmm.

That's something that crossed my mind briefly when I was first writing that comment, and I'll admit that that design doesn't handle this case well. Ideally, we'd be able to keep the same physical keymap across layouts (which is what you want for things like videogame keymaps), but that also leads to problems when other software developers haven't done that, causing our applications to violate those UX standards!

Something we could do is use the UIEvents-Code keycodes (or an equivalent), and structure keyboard events like this:

struct KeyboardInput {
    /// The pressed key, ignoring keyboard layout.
    ///
    /// Alphanumeric keys always correspond to their location on a QWERTY keyboard,
    /// regardless of whether or not the user is using an alternate keymap. For instance,
    /// pressing the Z key on a QWERTZ keyboard will result in `KeyCode::KeyY` getting
    /// sent. This also ignores any other remappings (e.g. even if the user has bound
    /// Control to Caps Lock, pressing the Caps Lock key will result in `KeyCode::CapsLock`.)
    ///
    /// This is useful for things like videogame keymaps, where the physical location of a
    /// key is more important than the actual key being pressed.
    physical_key: KeyCode,
    /// The pressed key, taking keyboard layout into account.
    ///
    /// If the user is using an alternate keyboard layout or have remapped any of their keys,
    /// their preferred mappings will be sent. Unlike `physical_key`, pressing Z on a QWERTZ
    /// keyboard will output `KeyCode::KeyZ`, and rebound keys as mentioned above will output
    /// the rebound key.
    ///
    /// This is useful for desktop application keymaps, where maintaining keybinding
    /// consistency with other applications is more important than the exact location of the
    /// key pressed.
    logical_key: KeyCode,
    /* other fields intentionally omitted */
}

EDIT: I have physical_key and logical_key using the same type to make it clear that they both have the same underlying variants. We may want to split them into separate types with the same internal layout, as we've done with DPI types, but that decision isn't important for establishing whether or not this general API is a good idea.

pyfisch commented 5 years ago

Well this design is a lot better.

What happens if I want to detect the "Page Up" key on my numpad? If "Num Lock" is on I want to receive the character "9" instead.

Osspial commented 5 years ago

There are two ways I can think of to do that:

My feeling is that we should take the first approach for logical_key and the second approach for physical_key. That sacrifices some consistency across the two input methods, but it also matches better with their respective stated goals.

pyfisch commented 5 years ago

Pass a location parameter alongside all keys that appear both on the numpad and elsewhere on the keyboard.

I understand that if I press the "Page Up" key I will get a logical_key of PageUp(Standard) and if I press "Page Up" on the numpad I get Page Up(Numpad). Is this correct? But if "Num Lock" is active I will instead receive "9". So some logical keys are now depend on modifiers present?

Osspial commented 5 years ago

I understand that if I press the "Page Up" key I will get a logical_key of PageUp(Standard) and if I press "Page Up" on the numpad I get Page Up(Numpad). Is this correct? But if "Num Lock" is active I will instead receive "9". So some logical keys are now depend on modifiers present?

That is correct. I realize that this may be inconsistent with my stance on the alphanumeric keys, but it feels like there's a difference here since enabling/disabling numlock fundamentally changes how those keys interact with applications, rather than just outputting a different variation of a character.

What is the logical_key value for keys not found on un-shifted US keyboards?

You're talking about these sorts of keys, right?

image

For those, I'd use the Intl**** codes from the UIEvents-Code spec. Honestly, I'd lean towards replacing some of the standard US Keyboard values in that block with more international codes, seeing as there's a pretty wide range in what different keyboard locales put on those keys.

What is the correct way to detect that a user pressed ":" (colon) for vim-style controls?

Because vim mainly uses character input for its controls, I'd say to use the character input API.

pyfisch commented 5 years ago

Yes the keys marked red. But also those found on keyboards for non Latin scripts.

I understand that you want to use codes from the UIEvents-Code spec for the logical_key values. But these codes are almost arbitrary names to describe keys with a shared location but widely varying functions. I don't know when I would want to use those key values.


I don't think we can reach a consensus on keyboard events. You appear to prefer an API with just a physical location value and a separate API for character input. You made some additions to the keyboard API but it feels rather crude now and heavily relies on the assumption that you know every keyboard layout in existence and can predict how it will be used. (fixed number of key values, how does a numpad work, ...) I especially disagree with not providing an API for shifted keyboard symbols. This is available across Windows, Linux, Mac OS, but you prefer to only expose character data.

I would recommend that if winit changes its keyboard API it copies one from an existing system and does not try to have a unique variant.

Something we appear to agree on, is that there should be a code for physical keyboard locations. Maybe we can add this to the existing API?

Osspial commented 5 years ago

I don't think we can reach a consensus on keyboard events. You appear to prefer an API with just a physical location value and a separate API for character input.

To be clear: I'd like character input to be delivered alongside the physical_key and logical_key values in the same event, just not expose the key as character input. Ideally, you'd have a keyboard input event structured like this:

struct InputEvent {
    keyboard_event: Option<KeyboardEvent>,
    composition_input: Option<CompositionEvent>,
}

struct KeyboardEvent {
    physical_key: PhysicalKey,
    logical_key: LogicalKey,
    key_state: ElementState,
}

enum CompositionEvent {
    Char(char),
    CompositionStart(String),
    CompositionUpdate(String),
    CompositionEnd(String),
}

That general structure associates character input with keyboard input, but exposes them as two separate things.

I'm not comfortable with exposing character input events and keyboard input events through the same enumeration (i.e. having enum Key {UnicodeKey(String), /*everything else*/}) for a couple of reasons: one, it creates an unnecessary stumbling block when creating keyboard shortcuts. Two, it hurts internationalization of keybindings.

About keyboard shortcuts: let's say that we exposed a Key enumeration similar to what's shown in the above paragraph, with UnicodeKey exposing shifted unicode values (as far as I understand, that's the structure you proposed initially). If somebody wanted to have control+z be a shortcut for undo, they might write this code:

match (key, modifiers) {
    (Key::UnicodeKey('Z'), Modifiers{ control: true, alt: false, shift: false, logo: false})
        => /*whatever undo stuff*/,
    _
}

The issue there is, because they're matching on Z and not z, that whole undo branch becomes dead code. It's not obviously dead code; there's no way for us to make the compiler warn about it, and it's doesn't seem immediately unreasonable, but it's the sort of API design that leads to developers banging their head against our library wondering why code that they'd think should work doesn't.

Regarding the second point: if a developer with a Latin-script keyboard creates a layout that associates 'a' with an action, and a Russian user (or some other user with a non-Latin keyboard) has a keyboard that doesn't output 'a' without some form of shifting, the non-Latin keyboard will in the best case have keybindings that require extra shifting to function; worst-case, the keybindings won't work at all. Conversely, non-Latin keybindings won't work on Latin keyboards, and an action bound to 'Б' will only work in a select few locales.

Neither of those are API compromises that I'm willing to accept. That's why I don't want to adopt the UI Events API verbatim - I think it's fundamentally flawed in ways that aren't obvious, but concretely harm both users and developers.


One thing that I haven't said but probably should've mentioned sooner: I'm in favor of having a mechanism for translating between our internal key enumeration and the default character output for the keyboard's layout. The intention would be to have a standardized internal structure for keyboard input and then display to the user whatever key value is associated with each particular key for their keyboard layout. I'm sorry I hadn't communicated that before - it's something that was in my head as a given, but seeing as I never wrote it down there's no way you would know that 😅.

Yes the keys marked red. But also those found on keyboards for non Latin scripts. I understand that you want to use codes from the UIEvents-Code spec for the logical_key values. But these codes are almost arbitrary names to describe keys with a shared location but widely varying functions. I don't know when I would want to use those key values.

Hey, you've gotta have some sort of arbitrary code. QWERTY just happens to be one that isn't arbitrary for a large portion of the world.

I mentioned possibly using some index-based system above, but I've since changed my mind on that. All the foreign-script keyboards I've seen from googling have also had QWERTY markings alongside their non-Latin characters, and if you're programming in Rust you need to have some amount of familiarity with a Latin keyboard to even start using the language.

You made some additions to the keyboard API but it feels rather crude now and heavily relies on the assumption that you know every keyboard layout in existence and can predict how it will be used. (fixed number of key values, how does a numpad work, ...)

How are those unreasonable assumptions to make? From the research I've done, the only difference in keyboard layouts are:

There are a limited number of "other, miscellaneous keys"; certainly few enough that we can expose them through a well-formed enum.

As far as assuming how a numpad works: it's a standard that keyboard manufacturers have settled on, and it seems to be standard across every keyboard that has a numpad. If we're making an abstraction we have to make assumptions somewhere, and there's nothing unreasonable about assuming this.

I especially disagree with not providing an API for shifted keyboard symbols. This is available across Windows, Linux, Mac OS, but you prefer to only expose character data.

What's the difference between exposing character input and shifted symbols? I've been working under the assumption that they're the same thing, but you're saying here that they're not; we may be talking about two different things here.

Something we appear to agree on, is that there should be a code for physical keyboard locations. Maybe we can add this to the existing API?

Yes, but I think we can go further with more comprehensive improvements. Like I've said elsewhere - I think that most of the ideas behind your proposal are good, I just don't agree with some of the specifics of how things should get exposed.

dhardy commented 4 years ago

Thanks for the work narrowing down an input specification. @Osspial's latest suggestion appears adequate, except that I share @pyfisch's scepticism about use of "key location" codes to describe logical_key.

An example (using X11 key location names): 1 appears as AE01 on US keyboards, Shift+AE01 on Azerty, and AltGr+AC07 on my keyboard. It should be possible to bind an action to e.g. Ctrl+1 and have it work correctly on all these keyboards. In practice, this means a semantic binding to 1, not where 1 would appear on a US keyboard. (The fact that 1 is commonly duplicated on the numpad perhaps works well with @Osspial's suggestion to use Key1(Location) within an enum.)

I also agree with @Osspial's point that the semantic value should be independent of case (people don't think of Ctrl+Z as being Shifted), however one has to be careful here: if an app is to react to Ctrl+1, then it should require the Ctrl modifier (left/right or both) but not care about the state of other modifiers...

... except, e.g., some systems use Ctrl+Z for Undo and Ctrl+Shift+Z for Redo. In general I think this can only be solved via the app checking only those modifiers it actually needs to check:

match key {
  Key::Key1 if modifiers.ctrl() => ctrl1_action(),
  Key::Z if modifiers.ctrl() && !modifiers.shift() => undo(),
  Key::Z if modifiers.ctrl() && modifiers.shift() => redo(),
  _ => (),
}

Regarding the symbolic Key / VirtualKeyCode, I believe the only good option is to produce a custom enumeration. Use of a String makes it too easy for users to use invalid codes without linting. Since apps should never match against this enum exhaustively, it can be #[non_exhaustive], making extension a non-breaking change.

This leaves the following still needing precise definition:

  1. Key-location codes (enum). The Codes specification is sufficient, or X11 names could be used (although I don't believe these extend to things like TV remotes).
  2. Symbolic key names (enum). The Key values specification could be used as a starting point. Alternatively the X11 key symbols could be used (approx. 1300 entries). This is #1266 (on hold, awaiting a decision here).
cheako commented 4 years ago

There are a few others who reported the duplicate key press event feature. It's annoying is there anything I can do to help this along?

These #1220 #146 #1184 should all be merged as duplicates.

pyfisch commented 4 years ago

The concept of consumed modifiers is relevant for how keyboard shortcuts work with national keyboard layouts.

Regarding the symbolic Key / VirtualKeyCode, I believe the only good option is to produce a custom enumeration. Use of a String makes it too easy for users to use invalid codes without linting.

I disagree. An enumeration will need constant updates and will most likely still lack relevant characters. The authors of the X11 keyboard protocol realized this at some point and decided to use unicode codepoints (with a fixed offset) as keysyms for all characters without an existing keysym. Therefore I think a String or char should be used for printable symbols.

For example on a German keyboard I can press AltGr+Shift+s which produces a (capital sharp s), this character is not found in the keysym table.

dhardy commented 4 years ago

Regardless of String vs char vs enum, winit will have to map system-specific codes to its own symbolic code internally (excepting maybe on one platform if it copies that platform's symbolic code). Thus I don't see an advantage to a String over a non-exhaustive enum.

pyfisch commented 4 years ago

Regardless of String vs char vs enum, winit will have to map system-specific codes to its own symbolic code internally

Platforms, such as X11, already provide functions to map keysyms to Unicode strings, take for example xkb_keysym_to_utf8. Internally named keysyms are looked up from a table but the directly encoded Unicode keysyms are calculated.

cheako commented 4 years ago

Is there a decision to code to the lowes common denominator or to extend interfaces that lack some features? I'm interested in an interface to XQueryKeymap(), even if it doesn't exist on every supported platform. I believe the idea of the interface is universal and should be emulated for platforms that don't provide a method to query a key's state.

cheako commented 4 years ago

I still feel like it's important to have consistency across implementations. It's not ideal if client code needs platform dependant code to implement basic functionality.

OvermindDL1 commented 4 years ago

I also agree that the functionality should be exposed, and even if it is not emulateable on a platform then the API should be an Option/Result instead in that case to state that it was not performable.

cheako commented 4 years ago

I understand, BUT. Think about how that looks for clients, do they just expect()? If they don't expect() then they will have this boilerplate code that could instead be part of Winit. I'm not aware of Winit's practice or precedent concerning this topic. I feel strongly that Winit should maintain an is key/button down query tool. Obviously there could/should(to me it doesn't matter) be a way to query if the implementation is using some form of emulation or a platforms provided backend.

I'm not opposed to this tool being part of another crate, I already maintain ash-tray that's an abstraction over Winit.

dhardy commented 4 years ago

@cheako any chance we can keep this issue focussed on how to identify a key (scancode, symbol etc.) rather than on the state of the key? The issue is complex enough as it is. Also keep in mind that none of the people who commented here within the last year are core contributors to this project; I suspect like most, this project simply lacks funding.

cheako commented 4 years ago

TLDR; "Is keydown" API should be explicit in the design of key events, not an afterthought. It would be a simple "go away, bugger off" to indicate that there will be no relation between these APIs or that an "is keydown" API will never be exposed. A sad thought, but an valid one.

Until the is keydown API is discussed, it would be incomplete to finish the API at the core of the discussion here. For example "How will clients *tie the events to querying key state?" could be important and deserves flushing out as part of designing a key event system. The consideration to allow or not the same (scancode, symbol etc.) could drive the decisions on how those are expressed, given that the backends have two ABIs and we are investigating relating them on the front end.

When talking about "is keydown", currently, we are talking about filtering events into a state machine that tracks the keys. I've found it impossible "for me" to do this given the current implementation. That's my interest in this ticket and given that the first bullet point in the OP is "Games" I believe it's only natural to test if the proposed solution is well suited for that application. I assume the majority of games will use some kind of is keydown approach.

cheako commented 4 years ago

I'm only developing in Linux and maybe KFreeBSD. Notably I'm unable to cross compile for wine. That said I'd be happy to break the API for the other platforms and if a group of like minded ppl are thinking the same for the other platforms then it would be possible to write a pull request that passes a test suit on those platforms.

cheako commented 4 years ago

Covid19... Is anyone interested in starting a working group to tackle this?

pyfisch commented 4 years ago

I'm interested. But I should warn you that this is a really thorny problem and it will be difficult to agree on one design and to convince the core maintainers to merge it. Additionally all supported platforms will need to be changed.

cheako commented 4 years ago

Perhaps there should be one interface for every use case? That should remove the difficulty or at least move the complexity to client design time.

Edit: To be clear. A few interfaces where the best interface to use would be platform dependent, but each interface should be usable across all platforms. Instead of trying to bend backwards to write one interface that generally covers all use cases.

dhardy commented 4 years ago

There are several (incomplete) proposals above. Perhaps a good starting point (if someone has the time) would simply be to create a table or short document comparing these.

As a next step, perhaps we should use the RFC model — let someone write up a proposal, and post as an RFC (maybe a new PR, maybe even just a new issue + a gist) — the important parts are to have a document and to have a focussed discussion topic.

But I am not a winit maintainer — could one of the maintainers clarify a choice of discussion model?

chrisduerr commented 4 years ago

I think your proposed RFC strategy comes with too much of a maintenance burden. This can be mostly just implemented and iterated upon.

pyfisch commented 4 years ago

This can be mostly just implemented and iterated upon.

@chrisduerr In the previous discussion multiple different (and sometimes mututally exclusive) proposals have been made. Which one do you think should be implemented and iterated upon?

chrisduerr commented 4 years ago

I don't have the time to go into too much detail here, but I think the gist of it is that it doesn't really matter. Stuff like hardware position that has been discussed a lot doesn't matter since it can be easily added after the fact based on all existing suggestions, though there's hardly ever a reason to use it if you ask me (even for games you don't want to map to location really).

Ultimately winit is not going to invent some magic new silver bullet here, because there is none. Focusing too much on the Web APIs is probably just going to be confusing, but just picking one existing solution that has somewhat aged well over time and going with that is probably the best solution.

An RFC would just be the same thing as this issue with an "RFC" tag attached to it. People are going to comment on how they don't like one aspect and the author is going to respond that they need that for reason XYZ. There's no advantage to RFCs over the current discussion, it's already a proposal and comments on that.

cheako commented 4 years ago

For games the solution is to allow the user to define key mapping... That's just how most games have worked. All winit needs todo is be consistent.

maroider commented 4 years ago

@chrisduerr

(even for games you don't want to map to location really).

Why wouldn't location be interesting for games? I've been under the assumption that it's more important that the key for "move back" is below "move forward" and between "move left" and "move right" than to make the user re-configure their keybindings because their keyboard layout moves these keys around (virtually and on the keycaps).

@cheako

For games the solution is to allow the user to define key mapping... That's just how most games have worked.

That's certainly not how, say, CS:GO works. I can change the keyboard layout from QWERTY to DVORAK, but the keys I have to press to move around don't change. I'd assume that CS:GO inherits this behavior from the Source engine, which means that there are some high-profile games which use scancodes rather than virtual keycodes for keybinds.


Support for layout-independent keyboard input should be implemented directly in winit if it's going to be exposed at all. The way it's currently done is insufficient: scancodes are platform-dependent (this isn't documented anywhere), which is a bit of a letdown for a cross-platform windowing library.

In any case: Discussing whether to expose cross-platform layout-independent keyboard input or not seems pointless when @Osspial and @pyfisch seem to be very much in favor of it. It looks to me like it's a question of how and when, not if, it's going to happen.

chrisduerr commented 4 years ago

Why wouldn't location be interesting for games? I've been under the assumption that it's more important that the key for "move back" is below "move forward" and between "move left" and "move right" than to make the user re-configure their keybindings because their keyboard layout moves these keys around (virtually and on the keycaps).

Because it's basically useless. For a lot of stuff you want mnemonics, so that doesn't help at all. For location the only really relevant thing is WASD maybe, but if that's all you need just use the arrow keys. But even WASD can't be used universally because of different keyboard layouts, since the same "location" isn't always in the same place.

So one way or another people will have to remap their bindings. Focusing on something that doesn't even get fixed by the solution just doesn't make much sense. Especially when such a thing isn't essential to the protocol and can be easily added after the fact.

dhardy commented 4 years ago

That's certainly not how, say, CS:GO works.

I've noticed a lot of games get this right (layout-independent codes) and a lot get it wrong. I use Colemak but bear in mind there are a lot of users with non-Qwerty layouts, e.g. Azerty.

Because it's basically useless. For a lot of stuff you want mnemonics,

Partly yes, partly no. Should Ctrl+Z be awkward to press on a Qwertz keyboard? (Okay, that's already too "standard" to fix.) Many games map far more than just WASD; sure, memorising all the keys is a pain, but in many cases location is more important than label.

scancodes are platform-dependent (this isn't documented anywhere)

Nevertheless a layout-independent mapping could be implemented over this via an extension mapping from location to scancode (and ideally also key label). IMO this may be the best route forward simply because it enables divide-and-conquer.

maroider commented 4 years ago

Nevertheless a layout-independent mapping could be implemented over this via an extension mapping from location to scancode (and ideally also key label).

Did you mean scancode -> location rather than location -> scancode here? I can imagine how you'd go about implementing the former (#[cfg(os)]-ed match-ing over the value of the scancode), but I can't say the same for the latter.

dhardy commented 4 years ago

I guess it doesn't matter. One can configure location codes, and either map to scancode and match that against input, or map input scancode → location and match that against configured codes. Either way, the mapping function can be added later.

Also desirable is a way to map location → key label so that location-based configuration can show users the key's label — I have no idea whether this is possible, but what I have seen in games suggests it probably is.

chrisduerr commented 4 years ago

Partly yes, partly no. Should Ctrl+Z be awkward to press on a Qwertz keyboard?

Yes, it definitely should be. Because that's the only way to communicate the location of the binding to the user. "Undo is mapped to the button where Z would be on your keyboard assuming you don't have a keyboard that reports buttons at different locations than we think where it's at" just doesn't really have a very "convenient" ring to it.

in many cases location is more important than label

I completely disagree. Especially with how unreliable that is.

pyfisch commented 4 years ago

Yes, it definitely should be. Because that's the only way to communicate the location of the binding to the user. "Undo is mapped to the button where Z would be on your keyboard assuming you don't have a keyboard that reports buttons at different locations than we think where it's at" just doesn't really have a very "convenient" ring to it.

It doesn't have to be. Say you want to inform a user that they need to use WASD on a US-keyboard to move. At least on Linux (libxkbcommon) you then call a function with a layout independent code for the W key and receive the correct label according to the selected keyboard layout. Repeat for all four keys. I assume that there is a similar interface on Windows and Mac as @dhardy suggests.

This interface certainly should not be part of the keyboard input MVP but may be necessary to give the best possible experience.

chrisduerr commented 4 years ago

It doesn't have to be. Say you want to inform a user that they need to use WASD on a US-keyboard to move. At least on Linux (libxkbcommon) you then call a function with a layout independent code for the W key and receive the correct label according to the selected keyboard layout. Repeat for all four keys. I assume that there is a similar interface on Windows and Mac as @dhardy suggests.

That assumes the user goes looking for the key combinations inside of the application.

This interface certainly should not be part of the keyboard input MVP but may be necessary to give the best possible experience.

This was the primary point I was trying to make. It doesn't make much sense to start splitting hairs over this.

maroider commented 4 years ago

I assume that there is a similar interface on Windows and Mac as @dhardy suggests.

MapVirtualKeyW with MAPVK_VSC_TO_VK_EX might do the trick on Windows.


Regarding how VirtualKeyCode should be handled WRT Linux:

There seems to be a need for VirtualKeyCode to have a Unicode(&'static str) variant (where &'static str is an intentionally leaked String) since Linux keycodes (+ some set of modifiers) can seemingly map to arbitrary keysyms/Unicode code points.

Would it be possible to always return the VirtualKeyCode you'd get without any modifiers? E.g. if I've got a weird keymap which is mostly the US one, except I've replaced the keysym bound to Shift+3 with ö, could winit then give me VirtualKeyCode::Key3?

Clearly, the mapping without modifiers is the proper "meaning" of the key (that's how it looks like to me at least).

cheako commented 4 years ago

Changing winit to fix issues with existing games? If we want all games(maybe using winit) to work in all situations, then promoting sane practices is the only way.

Sane practices like having a robust key to action editor... Promoting games that force key locations onto users dose a disservice to users. I find that WASD is located on the far left of my keyboard, leaving no room on that side for hot-keys. However YGHJ is centrally located making more keys not far from your hand while moving.

maroider commented 4 years ago

If we want all games(maybe using winit) to work in all situations, then promoting sane practices is the only way.

Sane practices like having a robust key to action editor...

No one here is saying that games (or other applications) shouldn't have user-configurable keybinds (or that scancodes invalidate the need for such a feature). I'd even go so far as saying that any game which ships without some kind of keybind editor is lesser for it. But the presence of user-configurable keybinds does not invalidate the need for sane (or even just usable) defaults, and that's what using scancodes instead of virtual keycodes should provide for games.

Making an application with user-configurable keybinds straight on top of winit isn't necessarily all that easy, but that's something that can be solved in other crates. The addition of a real API for key locations wouldn't change this much. User-configurable keybinds is either something winit shouldn't tackle or something that should be tackled separately from this issue. My money is on the former being the case.

Promoting games that force key locations onto users dose a disservice to users. I find that WASD is located on the far left of my keyboard, leaving no room on that side for hot-keys. However YGHJ is centrally located making more keys not far from your hand while moving.

Again, no one is saying that games must use WASD. WASD is just mentioned because it's the conventional set of movement controls, not because anyone mentioning WASD is convinced that WASD is the best thing ever and wants to force it on others.

cheako commented 4 years ago

Defaults is something that happens once, less than once on startup... Why should we run code on every keystroke for defaults?

dhardy commented 4 years ago

Promoting games that force key locations onto users dose a disservice to users.

The opposite — having to rebind WASD keys etc. because I use a different keyboard layout and the keys end up in inconvenient locations for one-handed use is a pain. The only one trying to force applications to do a specific thing here is you — both "label" and "location" (default) bindings should be supported.

Would it be possible to always return the VirtualKeyCode you'd get without any modifiers?

I don't know. But I'm also not sure it's desirable. E.g. + is Shift+= on a QWERTY keyboard but a dedicated key on QWERTZ., so a binding to + should not be labelled shift+=. Conversely, a binding to Shift++ might not be usable (or at least not distinct from +) on QWERTY, so one has to be careful with default key-maps. Ultimately in some cases the right thing may be to have multiple defaults and choose one based on locale and/or platform.

Why should we run code on every keystroke for defaults?

That shouldn't be necessary? This is why I suggested mapping from key positions to scancodes. But even if only the opposite map function (scancode → location) is available, apps can invoke this only as needed (potentially filling out a dictionary the first time each key is pressed).

maroider commented 4 years ago

@dhardy I think I might not have explained my thoughts well enough.

What I was proposing would lead to + on QWERTZ always giving + as a VirtualKeyCode, regardless of modifiers. Now that I've thought about it some more, I can't quite decide on how that would work with Windows or AZERTY. Windows can't, to my knowledge, quite re-map it's virtual keycodes in the same way. AZERTY swaps numbers and symbols on the number keys, so that would also make it behave very differently on Windows. Making VirtualKeyCode completely consistent across platforms seems a bit more complicated than I thought. I'm not even sure if it's entirely desirable now.

dhardy commented 4 years ago

Which is why I suggested having a huge enum like X11, but I'm not really sure if that's desirable either.