Clarify `keypress` event handling for keys that map to non-BMP Unicode symbols

w3c / uievents

UI Events

https://w3c.github.io/uievents/

Other

144 stars 51 forks source link

Clarify `keypress` event handling for keys that map to non-BMP Unicode symbols #346

Open mathiasbynens opened 1 year ago

mathiasbynens commented 1 year ago

See https://github.com/w3c/webdriver/issues/1741: browsers don’t agree on keypress events for keys that map to non-BMP Unicode symbols (i.e. code points beyond U+FFFF).

You can reproduce this on https://w3c.github.io/uievents/tools/key-event-viewer.html using a custom keyboard layout. I’m using https://github.com/mathiasbynens/custom.keylayout/tree/main/qwerty which lets me press a key to type 𝌆 (U+1D306), which consists of the surrogate halves U+D834 U+DF06.

In Safari, a single keypress event is emitted, with charCode/keyCode/which set to the full Unicode code point 0x1D306. (This is the behavior I’d expect as a user.)
In Firefox, two keypress events are emitted, one for each surrogate half (0xD834 and 0xDF06).
In Chrome, no keypress event is emitted.

Screenshot showing (from top to bottom) Safari, Firefox, and Chrome:

masayuki-nakano commented 1 year ago

Firefox developers must have intended to work it with String.fromCharCode(event.charCode), and IIRC, a pair of WM_CHAR is sent by Windows for a surrogate pair input.

jrandolf commented 1 year ago

IIRC, a pair of WM_CHAR is sent by Windows for a surrogate pair input.

@masayuki-nakano Thank you for clarifying this. Since this is the case, it would seem that there is some misalignment with Safari because Safari is only available on macOS and macOS dispatches the entire surrogate pair for an input.

With this information, there are three viewpoints to this problem:

From the OS perspective, the browser should make the same number of keypresses as number of events emitted to it. This implies the following table:

	Chrome	Firefox	Safari
Windows	2	2	N/A
macOS	1	1	1

From the user perspective, the browser should make the same number of keypresses as the user makes. This implies the following table:

	Chrome	Firefox	Safari
Windows	1	1	N/A
macOS	1	1	1

From the browser perspective, we should normalize behavior across platforms based on the (1) or (2). This implies the following tables:

Table (a)

	Chrome	Firefox	Safari
Windows	2	2	N/A
macOS	2	2	2

Table (b)

	Chrome	Firefox	Safari
Windows	1	1	N/A
macOS	1	1	1

jrandolf commented 1 year ago

I suggest we align with the OS perspective for the following reasons:

Safari's approach is not incorrect according to the spec and expecting Safari to change this due to OSes Safari doesn't work on is a stretch. This implies we cannot do (3)(a).
This is the simplest approach from a programmatic perspective (for both Firefox and Chromium) and is cross-browser aligned (not cross-OS however).
Since keypress is deprecated and already has cross-browser misalignment, (2) and (3)(b) are not practically expected by the user.

masayuki-nakano commented 1 year ago

I think that it's not good approach to align to OS behavior. I guess that most web developers do not check/test key input behavior in all major platforms per browser. Therefore, inconsistent behavior between OSes may make end users inconvenient.

keypress shouldn't be used for text input handling in new web apps. Therefore, the existing web apps using keypress events for the purpose may be not maintained. If so, fixing incompatible behavior may cause breaking some of them.

On the other hand, I don't know keyboard layouts which have a key to input a non-BMP character. Therefore, this issue may appear only in specific environments. (Note that like Emoji palette in each OS, browsers do not handle them as a key sequence, therefore, this is really a special case for most users.)

masayuki-nakano commented 1 year ago

FYI: a bug in Firefox

masayuki-nakano commented 1 year ago

Ah, this may be a dup of #227 (although it's InputEvent).

drwez commented 1 year ago

Note that if you enter e.g. an emote using the Windows On-Screen Keyboard then that will be expressed as two distinct keydown/keypress/keyup sequences, with the keypress part of each describing one of the two UTF-16 surrogates.

For those events, though, the VKEY value is "PACKET" - the keydown/keyup events are essentially a platform-specific quirk that conveys almost no extra information - so it could make sense for the browser to simply drop those events, and coalesce the WM_CHARs into a single valid Unicode code-point.

As Masayuki points out, though, given that the keyCode field and keypress event are deprecated and their spec is normative should it really define a behaviour, or simply document the UTF-16 & UCS4 models for keypress events, and tweak the MUST wording for the implementation of keypress for non-BMP?

Pauan commented 1 year ago

@drwez When using the On-Screen Keyboard, Internet Explorer and Edge both send a single event. This is not a Windows platform issue, it's not an OS issue.

The bad behavior is specific to Firefox / Chrome on Windows, and unfortunately it does cause real-world problems:

https://github.com/Pauan/rust-dominator/issues/10

https://github.com/rustwasm/wasm-bindgen/issues/1348

It's a widespread problem that affects many events, not just keydown/keypress/keyup. Even idiomatic events like input also have the same problem.

Safari's behavior is correct. Internet Explorer / Edge behavior is correct.

Chrome and Firefox are simply buggy, they are sending invalid incorrect strings. They should be fixed so that they behave correctly and consistently. Here are the relevant bug reports for those browsers:

https://bugzilla.mozilla.org/show_bug.cgi?id=1541349

https://bugs.chromium.org/p/chromium/issues/detail?id=949056

Ideally this behavior would be specified in the spec, so that way it is easier for the browsers to coordinate their behavior.

I don't know what the wording should be, but the behavior should be something like "if an input character is outside of the BMP then the browser MUST NOT send multiple events (one event per surrogate pair), instead it MUST send a single event (which contains both surrogate pairs)".

masayuki-nakano commented 1 year ago

Chrome and Firefox are simply buggy, they are sending invalid incorrect strings.

No, .charCode is not a String, it's unsigned long. That makes the things complicated. For appending it to a String object, it requires to call String.fromCharCode instead of +=. If it were String, just fixing in Firefox and Chrome must have been fine and no risk.

Pauan commented 1 year ago

@masayuki-nakano I don't have a Windows machine right now, so I can't test it, but how does Edge handle charCode for non-BMP characters? When I last tested Edge version 42, Edge sent only 1 event, not 2.

I understand the web compat issues, but if Safari and Edge have already fixed the issue, then the web compat issue must not be that big of a deal, or we would have heard about it.

mathiasbynens commented 1 year ago

Chrome and Firefox are simply buggy, they are sending invalid incorrect strings.

No, .charCode is not a String, it's unsigned long. That makes the things complicated. For appending it to a String object, it requires to call String.fromCharCode instead of +=. If it were String, just fixing in Firefox and Chrome must have been fine and no risk.

Can you clarify what you mean? I might be misunderstanding. Turning a numeric UTF-16 code unit into a string requires String.fromCharCode, regardless of whether astral symbols / surrogate pairs are at play.

drwez commented 1 year ago

I believe Masayuki's point is that String.fromCharCode() takes a sequence of UTF-16 code-units, not UCS4 code-units (i.e. code-points). Changing the legacy charCode to return Unicode code-points rather than UTF-16 code-units would mean "char code" being used inconsistently across different contexts.

With input events there is only the data field, conveying the input text as a text string, though with the fun caveat that (at present) it seems that some platforms (e.g. Windows) supply UTF-16 surrogate code-units in independent events, which is a little painful.

To reiterate my earlier point: Defining the desired behaviour for the modern input events (e.g. input, keydown) in the presence of multi code-unit input (e.g. UTF-16 surrogate pairs) seems the first place to focus. For the legacy events the right normative specification may need to differ (e.g. continuing to have keypress delivered once per UTF-16 code-unit, for example), reflecting how things have historically worked, rather than how we'd ideally expect/want them to. :)

drwez commented 1 year ago

Re the event sequence from the Windows OSK: I don't have a device handy right now with which to verify the platform level behaviour wrt keydown/keyup (or rather WM_KEYDOWN/WM_KEYUP), but AFAIK that there are two keypress events for each non-BMP character typed is expected.

Pauan commented 1 year ago

With input events there is only the data field, conveying the input text as a text string, though with the fun caveat that (at present) it seems that some platforms (e.g. Windows) supply UTF-16 surrogate code-units in independent events, which is a little painful.

That is not a Windows platform issue, because Internet Explorer and Edge do not have that issue.

It is a deviant behavior from Chrome / Firefox only (which happens to only affect Windows).

drwez commented 1 year ago

We may be talking about different things, or different versions of Windows, then?

Using the OSK on a Windows 10 device, Edge shows the same behaviour as Chrome (which is unsurprising, since both are based on Chromium), with two keydown/keypress/keyup sequences, one for each code-unit of the surrogate pair used to encode the emote.

Pauan commented 1 year ago

@drwez You specifically mentioned the input event. When I tested Edge 42, it only sent 1 event, whereas Firefox / Chrome sent 2 events.

It seems Edge 42 was before the switch to Chromium. So it is not a Windows issue, because the EdgeHTML engine did not have that issue. It is specific to the Blink / Gecko engines. And that means it is fixable, it is not an OS limitation, it is specific to particular browser engines.

That also means that for several years Edge on Windows did not have this bug, but Firefox / Chrome did have this bug. Which means websites already needed to take into account the (correct) behavior of Edge and Safari, so the compat issues should be minimal.

drwez commented 1 year ago

Ah, OK. input is a distinct event from the keypress event that this issue discusses; I mentioned it in comparison to the (legacy) keypress event, since it suffers similarly from reflecting the underlying platform behaviour too closely, at present.

As you point out, though, the current behaviour of Chrome (and presumably Edge) for Unicode code-points that require surrogate pairs to express in UTF-16 is incorrect with respect to the InputEvent.data wording in the current UI Events spec. The Windows platform conveys non-BMP characters via a pair of WM_CHAR events, each holding one of the UTF-16 surrogate pair code-units, and Chromium is presumably just routing those directly to input events, whereas the old Edge engine (and others) are doing some additional processing to only surface complete code-points. I've filed a bug against Chromium for that (crbug.com/1450498) for that.

jrandolf commented 1 year ago

It seems we are going back and forth here. Let me summarize the current situation based on my work and all the evidence currently available:

I.E., Safari, and Edge (Non-chromium-based) have the expected, spec-defined behavior.
Chromium-based browsers and Firefox do not have the expected, spec-defined behavior.
- These browsers implement direct translation of OS events to key events.
  - On Windows, this implies two events are sent for a surrogate pair.
  - On macOS, this implies one (chromium is currently broken w.r.t. macOS surrogate pairs)
  - Fix is ongoing: https://chromium-review.googlesource.com/c/chromium/src/+/4561928

This implies Chromium-based browsers and Firefox need an intermediate layer that joins the surrogate pairs to emit a proper keyboard sequence on Windows.

@Pauan @drwez There are two perspectives here. One can blame Windows for dispatching two events or one can blame the bad browsers for not handling two events on Windows. Since it's not reasonable to expect Windows to change behavior (obviously), we can attribute the blame to the browsers.

For everyone in this issue, let's try to conclude with a solution:

I believe the spec is well-defined, in that surrogate pairs must be fully concatenated before emitting an event. Although this is not specifically stated, it's a given considering past behavior in Edge and I.E. + current behavior in Safari. WDYT?

drwez commented 1 year ago

Re:

It seems we are going back and forth here. Let me summarize the current situation based on my work and all the evidence currently available: [snip]

This description conflates the keypress and input events, which are two different events - one is legacy, one is explicitly specified.

Re:

I believe the spec is well-defined, in that surrogate pairs must be fully concatenated before emitting an event. Although this is not specifically stated, it's a given considering past behavior in Edge and I.E. + current behavior in Safari. WDYT?

No, that's not a given at all I'm afraid. If things were appropriately defined in the spec then this spec issue would not exist :)

The keypress event and charCode documentation describes legacy pre-spec events; those events have historically delivered UTF-16 code-units, and the charCode terminology elsewhere refers to UTF-16 code-units - it's not clear that it would make sense to change that now.

The input event is explicitly specified to return strings of characters (i.e. code points) - so it is specified differently from keypress. Chromium (and newer Edge builds) does not implement things that way consistently - I've filed crbug.com/1450498 for the Chromium issue under Windows. Again, though, that's an implementation bug, not a spec issue.

jrandolf commented 1 year ago

No, that's not a given at all I'm afraid.

I think there is miscommunication. It is explicitly stated that the key code is given as the unicode code point (or 0). See https://www.w3.org/TR/uievents/#determine-keypress-keyCode. When I wrote

in that surrogate pairs must be fully concatenated before emitting an event

I meant that this specific statement wasn't stated, but this statement is a given since a broken surrogate pair is not a unicode code point.

If things were appropriately defined in the spec then this spec issue would not exist :)

The reason this issue exists is not because the spec is not well-defined, but the fact there is misalignment between implementations and the spec. Again, there are two perspectives here:

The browsers are being bad and implementing the spec wrong or
the spec is too strict and we need to relax the conditions.

I'm stating that we should confess (1) and reenforce the current wording of the spec with specifics on surrogate point handling.

masayuki-nakano commented 1 year ago

I believe Masayuki's point is that String.fromCharCode() takes a sequence of UTF-16 code-units, not UCS4 code-units (i.e. code-points). Changing the legacy charCode to return Unicode code-points rather than UTF-16 code-units would mean "char code" being used inconsistently across different contexts.

Yes, that's what my point is. If .charCode may contain non-BMP character's code point, web apps need to use String.fromCodePoint instead. However, .charCode exists for older web apps. Therefore, I assume that changing the meaning would break the not-maintained web apps will be broken if the use .fromCharCode.

If web apps wants to access a (maybe) valid Unicode character, .key is available since 6 years ago. Therefore, we should keep the legacy API behavior as-is for avoiding to break web apps in the wild.

masayuki-nakano commented 1 year ago

In my understanding, .key, .code are intended to replace .charCode and .keyCode with keeping backward compatibility. Therefore, changing the legacy ones' behavior may duplicate same functional API.

masayuki-nakano commented 1 year ago

And in the Firefox's case, the behavior is originated in the path to handle dead key of Windows. If you type a dead key and KeyQ (assuming it's invalid combination), then, a punctuation corresponding to the dead key and the character for KeyQ are sent with WM_CHARs for the last WM_KEYDOWN. The non-BMP character key press works same as so (except the preceding dead key down/up sequence). Therefore, the behavior appears in historical reason. I don't know about Chrome, they might just emulate same behavior as IE and Firefox.

jrandolf commented 1 year ago

@masayuki-nakano I think our browser logic is somewhat similar. It should be since Firefox and Chrome output the same sequence on Windows. On macOS, Chrome is just broken w.r.t. unicode keypresses.

Therefore, we should keep the legacy API behavior as-is for avoiding to break web apps in the wild.

Since each browser already has different behavior, all web apps using keypress are already broken for some OS/browser pair. The point of this issue is not to modify the legacy API drastically to match some expected behavior. Since it's legacy, I suggest we move forward with the original solution I proposed: if we align with the OS behavior, the modifications we have to do on our end (Firefox and Chromium) will be minimal

For keydown, keyup, and input, we obviously stick strictly to the spec; only unicode code points, no broken surrogate pairs.

jrandolf commented 1 year ago

Alright, so after some internal discussion with @drwez, we've designed the following solution:

Let's remark in the spec that keypress SHOULD be UTF-16. This will allow Safari's behavior and allow Firefox and Chromium to maintain current behavior.
keydown, keyup, and input shall remain the same.

@drwez @masayuki-nakano WDYT?

drwez commented 1 year ago

Focusing on the scope of this bug (i.e. just the spec, not the bugs in the various implementations), it sounds like there are one or two AIs:

Revise the wording around keypress:
- Describe that implementations SHOULD for compatibility use the keypress-per-surrogate model, in which case charCode MUST hold one of the two UTF-16 surrogate code-units.
- Describe that implementations MAY instead emit a single keypress, in which case charCode MUST be set to the Unicode code-point of the generated character. We might also provide a snippet of the rudimentary logic required to cope with both behaviours.
- Decide whether to express that implementations MAY not emit keypress at all (given that it is a legacy event).
Add explicit wording to input.data's specification regarding whether implementations MUST, or SHOULD, or needn't, ensure to deliver non-BMP characters whole, or whether events with input.data containing individual surrogates are acceptable.

Depending on the agreement for #2, it looks like Firefox and Chromium would then need their input behaviour fixing.

Separately there is the question of whether keypress events should have non-trivial values set for the modern fields we specified for use in keydown and keyup events (notably key or code), so I've filed https://github.com/w3c/uievents/issues/349 for that to be discussed.

garykac commented 1 year ago

It is explicitly stated that the key code is given as the unicode code point (or 0). See https://www.w3.org/TR/uievents/#determine-keypress-keyCode.

Note that that entire section is non-normative. We do not intend to normatively specify keypress or the deprecated keyCode and keyChar attributes, although we can certainly add implementation notes.

I meant that this specific statement wasn't stated, but this statement is a given since a broken surrogate pair is not a unicode code point.

From unicode.org:

Surrogates are code points from two special ranges of Unicode values, reserved for use as the leading, and trailing values of paired code units in UTF-16.

So sending a single surrogate code point is technically valid according to the current text of the spec. Allowing a Unicode character from 2 surrogate pairs would require the spec to be re-worded.

Add explicit wording to input.data's specification regarding whether implementations MUST, or SHOULD, or needn't, ensure to deliver non-BMP characters whole, or whether events with input.data containing individual surrogates are acceptable.

The spec is actually clear on this. The data attribute is a DOMString, which usually permits unmatched surrogate pairs, but the text in the spec states it should only contain Unicode characters (so maybe the attribute should instead be defined as a USVString). Based on this, Firefox is not correct to include unmatched surrogates.

From my perspective, the primary problem here is when 2 separate event sequences are sent when the user enters a single character. I think this is unexpected and undesirable. In the Firefox example, I get the sense that the main reason for sending multiple input (and other) events so that it can set the keyCode attribute correctly for each surrogate half.

In the examples above, I think that Safari and Chrome are both doing appropriate things (except that Chrome is not setting the key attribute of the keydown/keyup properly).

Here are my high-level thoughts on this:

Only one event sequence (beforeinput, input) should be sent in response to the user selecting one character.
The UIEvent data attribute might be better defined as a USVString, but we are explicit in the text, so I'm not sure if this is worthwhile.

To fix things, I believe the key changes (Firefox/Chrome) needed are:

Fix it so that unmatched surrogates are not included in the UIEvent data field (to match the current spec).
Only send beforeinput and input events once for emoji (and other surrogates)

To support these fixes, we might need minor spec updates based on how Firefox/Chromium choose to approach this. For example, we could consider any of the following:

Update (non-normative) spec text to redefine keyChar to allow Unicode characters (instead of just code points).
Add a note that the keyChar attribute might not handle surrogates properly (only having the first or last half, for example)
State that multiple keypress events can happen for surrogates.
... (something else)

Note that anything we say in the spec regarding keypress and keyChar will be informational (ie: non-normative). I don't have strong opinions here about these different approaches.

jrandolf commented 1 year ago

Thanks @garykac for your input. It definitely clarifies/reinforces some of the thoughts we discussed in this issue. I've created another issue to discuss the USVString issue for the input event. It also includes the composition events: https://github.com/w3c/uievents/issues/352

As you mentioned, Firefox is the only implementation that doesn't follow that issue at the moment.

Regarding keypress, I think we still would need to stick with the AI's provided by @drwez.

masayuki-nakano commented 1 year ago

(Sorry for the delay to reply, I lost notifications during catching COVID-19 in early this month.)

Based on this, Firefox is not correct to include unmatched surrogates.

Oh, yeah, it's just a bug.

Those handlers should refer KeyboardEvent.key value instead. I'll fix it. (Oh, I realized that, we fail to set .key value for first keypress to a surrogate pair. I'll fix it too.)

From my experiences, if it'll be standardized, only one behavior should be defined. E.g., UI Events has non-normative explanation about keyCode and charCode values of keypress, that defines 2 models, but now Firefox aligned the model from the split model to the conflated model because of compatibility with the other browsers. Therefore, defining minor browsers' behavior may just make the developers confused, and all browsers should take same behavior in any OSes if it's possible.

hsivonen commented 1 year ago

Note that if you enter e.g. an emote using the Windows On-Screen Keyboard then that will be expressed as two distinct keydown/keypress/keyup sequences, with the keypress part of each describing one of the two UTF-16 surrogates.

This is not the case in Web engines originating from the platform vendor. Here are screenshots from IE and EdgeHTML-based Edge running on Windows 10 2004 showing the page https://hsivonen.com/test/moz/input.html (note that the event log shows the most recent event first) with the following actions taken with focus in the input field:

Pressing the key QWERTY-labeled-a key with the keyboard layout set to English
Pressing the key QWERTY-labeled-a key with the keyboard layout set to Greek
Pressing the key QWERTY-labeled-a key with the keyboard layout set to Fulfulde (ADLaM)
Clicking the emoji 😊 on the on-screen touch keyboard

IE: https://hsivonen.fi/screen/ie-ascii.png https://hsivonen.fi/screen/ie-greek.png https://hsivonen.fi/screen/ie-adlam.png https://hsivonen.fi/screen/ie-emoji.png

EdgeHTML-based Edge: https://hsivonen.fi/screen/edgehtml-ascii.png https://hsivonen.fi/screen/edgehtml-greek.png https://hsivonen.fi/screen/edgehtml-adlam.png https://hsivonen.fi/screen/edgehtml-emoji.png

Notably: In all cases:

There is a single sequence of keyboard events per single Unicode Scalar Value (These screenshots don't show multi-scalar-value emoji, which I've tested earlier; those behave as if a key was pressed for each Unicode Scalar Value component)
The string property shows the whole Unicode Scalar Value as valid UTF-16 string, which is a surrogate pair for non-BMP characters.
The charCode integer is bogus for non-BMP characters.

drwez commented 1 year ago

@hsivonen My description was in relation to the events received from the platform, not the way that those events are interpreted by the user agent, which I think we'd already discussed earlier as differing. :)

The keypress behaviour shown for IE and EdgeHTML don't really make sense, since the charCode field has a bogus value - since keypress is a legacy event having callers expected to use the key field to get at the real meaning, rather than simply using the standard input event, seems unhelpful. That the two implementations differ in their choice of charCode value suggests that the behaviours were artefacts of an implementation choice, rather than a conscious decision.

hsivonen commented 1 year ago

My description was in relation to the events received from the platform, not the way that those events are interpreted by the user agent

Is it known that IE and EdgeHTML use the same system API surface as Gecko and Blink? Notably, https://learn.microsoft.com/en-us/windows/win32/inputdev/wm-unichar seems to exist.

The keypress behaviour shown for IE and EdgeHTML don't really make sense, since the charCode field has a bogus value

Indeed the charCode part doesn't make sense. However, the key field is consistent with Safari: https://hsivonen.fi/screen/safari-adlam.png . This is a pretty strong indication that it's Web-compatible to emit one sequence of keyboard events per Unicode Scalar Value and to represent the Unicode Scalar Value as two UTF-16 code units in the key field.

That the two implementations differ in their choice of charCode value suggests that the behaviours were artefacts of an implementation choice, rather than a conscious decision.

Yes, but the bogus values suggest that it's not that likely for the Web to be relying on charCode, which means it's quite possible that it would be feasible for other engines to align to Safari's behavior, which (absent Web compat constraints to the contrary) is clearly the best behavior (no unpaired surrogates, charCode integer shows the same scalar value as the key string).

The Chrome Mac behavior (https://hsivonen.fi/screen/chrome-mac-adlam.png) also suggests that it should be Web-compatible to align to the Safari behavior.

From my perspective, the primary problem here is when 2 separate event sequences are sent when the user enters a single character.

I think the primary problem with splitting non-BMP characters across events is that (as far as I know) this is the only case where the environment that JS/Wasm runs in introduces unpaired surrogates. In every other case, environment-supplied DOMStrings are actually well-formed UTF-16 and the only way for a site-supplied program to get an unpaired surrogate in a string returned by a browser API is to first offer an unpaired surrogate as input to a browser API.

Therefore, these events are the only place in the platform that breaks the mappability of DOMString to the native string type of compiled-to-Wasm languages whose native string type's value space is a sequence of Unicode Scalar Values. For practical purposes today, this means Rust, but in principle it also means Swift (which, as I understand it, isn't a common compile-to-Wasm language today).

That multi-scalar-value emoji that is a single user-perceived character and a single press of a Windows 10 touch keyboard "key" gets spread across multiple events is not a problem for the perspective of mappability to Rust (or Swift) strings, since the key field of each event is well-formed UTF-16.

Mac: https://hsivonen.fi/screen/safari-adlam.png https://hsivonen.fi/screen/firefox-mac-adlam.png https://hsivonen.fi/screen/chrome-mac-adlam.png

Firefox on Windows: https://hsivonen.fi/screen/firefox-windows-greek.png https://hsivonen.fi/screen/firefox-windows-adlam.png https://hsivonen.fi/screen/firefox-windows-emoji.png https://hsivonen.fi/screen/firefox-windows-facepalm.png

Chrome on Windows: https://hsivonen.fi/screen/chrome-windows-ascii.png https://hsivonen.fi/screen/chrome-windows-greek.png https://hsivonen.fi/screen/chrome-windows-adlam.png https://hsivonen.fi/screen/chrome-windows-emoji.png https://hsivonen.fi/screen/chrome-windows-facepalm.png

Notably, Chrome on Windows treats Adlam, which is an actual keyboard layout, as an IME even though it treats the emoji touch keyboard as a keyboard!

Considering that Chrome on Windows doesn't even appear to treat non-BMP keyboard layouts as keyboard layouts (even though IE, EdgeHTML, and Firefox treat them as keyboard layouts), I have a really hard time believing that the Web Platform couldn't converge on the combination of Safari and Windows 10 touch keyboard behaviors:

Emit each Unicode Scalar Value as one event sequence like in EdgeHTML with the Windows 10 touch keyboard.
Make the key/data and charCode fields of the events in each such sequence look like they do in Safari on Mac.

masayuki-nakano commented 1 year ago

My description was in relation to the events received from the platform, not the way that those events are interpreted by the user agent

Is it known that IE and EdgeHTML use the same system API surface as Gecko and Blink? Notably, https://learn.microsoft.com/en-us/windows/win32/inputdev/wm-unichar seems to exist.

As far as I've tested, Emoji palette in the onscreen keyboard of Win10/11, it sends 2 sets of VK_PACKET keydown and keyup. Translating first WM_KEYDOWN introduces WM_CHAR for high surrogate and second WM_KEYDOWN introduces WM_CHAR for low surrogate. Therefore, it seems that browsers need to wait next WM_KEYDOWN when the first WM_KEYDOWN is detected and stop dispatching keydown and keyup for first one.

One problem here is, browsers need to keep storing the last surrogate pair if .key of keyup needs to be set to the surrogate pair. I don't know whether there is an API to get last unicode point which was introduced by the preceding WM_KEYDOWN, but I guess there is no such API. (Similar issue occurs for .key of keyup in a dead key sequence.)

The Chrome Mac behavior (https://hsivonen.fi/screen/chrome-mac-adlam.png) also suggests that it should be Web-compatible to align to the Safari behavior.

One of the problems of this approach is, only editable applications can detect text input strictly. (There is no attribute in KeyboardEvent which let web apps know whether it inputs text or not.) So, web apps need to guess with modifier state if they handle only keydown events in non-editable elements. (Although Firefox already takes this approach in macOS and Linux for text input coming without keyboard events. bug 1520983 and bug 1712269.)

Notably, Chrome on Windows treats Adlam, which is an actual keyboard layout, as an IME even though it treats the emoji touch keyboard as a keyboard!

How does it work if the field is not editable like readonly mode of Keyboard Event Viewer?

And with a custom keyboard layout created with MSKLC, I see usual sequence of keyboard events in Chrome for Windows.

So, the Adlam keyboard layout could change their behavior with window class name of focused window.

drwez commented 11 months ago

One problem here is, browsers need to keep storing the last surrogate pair if .key of keyup needs to be set to the surrogate pair. I don't know whether there is an API to get last unicode point which was introduced by the preceding WM_KEYDOWN, but I guess there is no such API. (Similar issue occurs for .key of keyup in a dead key sequence.)

Browser implementations under Windows could certainly attempt to "collect" the first UTF-16 surrogate rather than propagating it, and then only emit an actual keypress if/when the second surrogate WM_CHAR is received - that would be a similar conceptually to the dead-key handling logic.

Is it known that IE and EdgeHTML use the same system API surface as Gecko and Blink? Notably, https://learn.microsoft.com/en-us/windows/win32/inputdev/wm-unichar seems to exist.

As per the documentation you linked, WM_UNICHAR is provided only as a convenience for use by applications to inject Unicode character input without having to decompose it to UTF-16 code-units. While the default message-handler will decompose it into a pair of WM_CHAR messages, for applications that don't handle it explicitly, it's not a message that the system itself ever sends.

The keypress behaviour shown for IE and EdgeHTML don't really make sense, since the charCode field has a bogus value

Indeed the charCode part doesn't make sense. However, the key field is consistent with Safari: https://hsivonen.fi/screen/safari-adlam.png . This is a pretty strong indication that it's Web-compatible to emit one sequence of keyboard events per Unicode Scalar Value and to represent the Unicode Scalar Value as two UTF-16 code units in the key field.

Sadly, not really - non-BMP keyboard input is still incredibly rare, so it seems plausible that it's not a case that folks are noticing is broken with their implementations yet.

That the two implementations differ in their choice of charCode value suggests that the behaviours were artefacts of an implementation choice, rather than a conscious decision.

Yes, but the bogus values suggest that it's not that likely for the Web to be relying on charCode, which means it's quite possible that it would be feasible for other engines to align to Safari's behavior, which (absent Web compat constraints to the contrary) is clearly the best behavior (no unpaired surrogates, charCode integer shows the same scalar value as the key string).

See above; non-BMP is still so rare that I suspect we're just not (yet) seeing folks impacted by the brokenness of charCode in some implementations.

The Chrome Mac behavior (https://hsivonen.fi/screen/chrome-mac-adlam.png) also suggests that it should be Web-compatible to align to the Safari behavior.

Chrome Mac isn't emitting keypress at all in that example, so I don't think it's relevant to the question?

From my perspective, the primary problem here is when 2 separate event sequences are sent when the user enters a single character.

I think the primary problem with splitting non-BMP characters across events is that (as far as I know) this is the only case where the environment that JS/Wasm runs in introduces unpaired surrogates. In every other case, environment-supplied DOMStrings are actually well-formed UTF-16 and the only way for a site-supplied program to get an unpaired surrogate in a string returned by a browser API is to first offer an unpaired surrogate as input to a browser API.

I think Gary was referring to the fact that Firefox emits two separate input events, for the two surrogates (as does Chrome on Windows).

Firefox and Chrome Windows are consistent with historical behaviour of keypress in this regard - the main issue that they have is that they're then continuing on to emit two distinct input events, which goes against the spec but happens to "work", for the most part. I think we're all in agreement that the browsers should fix that. :)

Since the spec for keypress is not specification but rather historical documentation, we're constrained, I think, to documenting the set of behaviours that content might need to content with, which currently includes:

Two keypress each holding one surrogate code-unit in charCode.
One keypress holding a whole Unicode code-point in charCode.
No keypress at all.
One keypress holding only the first surrogate of the pair in charCode. but clearly some of these behaviours are more reasonable/helpful than others. :)

Notably, Chrome on Windows treats Adlam, which is an actual keyboard layout, as an IME even though it treats the emoji touch keyboard as a keyboard!

That's an interesting observation! Both behaviours seem technically valid, though the Firefox behaviour seems more useful. I wonder what the difference there is.

Considering that Chrome on Windows doesn't even appear to treat non-BMP keyboard layouts as keyboard layouts (even though IE, EdgeHTML, and Firefox treat them as keyboard layouts), I have a really hard time believing that the Web Platform couldn't converge on the combination of Safari and Windows 10 touch keyboard behaviors:

The Web Platform has converged on behaviours for keydown, input and keyup (though some implementations are buggy particularly with regard to input, as we've discussed).

keypress is a legacy event maintained for compatibility with older sites & frameworks, though - as Gary said:

Note that that entire section is non-normative. We do not intend to normatively specify keypress or the deprecated keyCode and keyChar attributes, although we can certainly add implementation notes.

So the spec can document reasonable behaviour in the hope that new implementations will adopt it, and even that existing implementations will converge where feasible without breaking compability too much, but the situation differs from the normative specifications. As a concrete example, if Chromium were to migrate charCode to hold the whole Unicode code-point then that will break sites that use String.fromCharCode() to process the field; they'd need updating to use String.fromUnicodeCharacter() to remain compatible.

Pauan commented 11 months ago

@drwez Sadly, not really - non-BMP keyboard input is still incredibly rare, so it seems plausible that it's not a case that folks are noticing is broken with their implementations yet.

I don't think that's true. This issue was originally found because of emojis. Emojis have become incredibly commonplace and are used extensively by everybody.

I see people using emojis all the time on websites, e.g. YouTube comment section, Facebook, Twitter. 1.5 Billion tweets use emojis.

So I think if this was a major compat issue for Safari we would have heard about it. Just think about how popular the iPhone is, and how often people use emojis.

drwez commented 11 months ago

This bug is specifically about the legacy keypress event, and by extension the legacy charCode attribute on that event - to encounter the keypress brokenness previously described the user would have to be (1) typing emoji or other non-BMP characters (2) using a browser with broken charCode for non-BMP and (3) into a web-site that is actively processing keypress.charCode to receive characters.

The bug you link to is with "input" events being generated incorrectly and then handled unusually (it sounds like the single-surrogate DOMStrings are being treated as a complete Unicode code-point, somewhere in rust-dominator) and appears to be on Windows, not macOS. It is the case that input as currently defined should only ever report complete Unicode characters in data and fixing that should be safe from a backward-compatibility perspective.

Pauan commented 11 months ago

(1) typing emoji or other non-BMP characters (2) using a browser with broken charCode for non-BMP and (3) into a web-site that is actively processing keypress.charCode to receive characters.

Yes, that applies to a lot of situations. It is very common for websites to use event listeners to monitor comment textboxes. For example, Twitter monitors the textbox so it can update the "maximum characters allowed".

it sounds like the single-surrogate DOMStrings are being treated as a complete Unicode code-point, somewhere in rust-dominator

Incorrect, it is not a dominator or Rust or Wasm bug, it is 100% a browser bug. This was already well established. That bug report is what lead hsivonen to file bug reports against the browsers, which then lead to this spec bug.

I've been involved in this entire situation from the very beginning, I am well aware of what is going on.

and appears to be on Windows, not macOS

Yes, which is exactly what this bug is about: Chrome and Firefox on Windows are incorrectly generating 2 events when they should generate 1 event. Safari correctly generates 1 event.

So the concern is that if Chrome and Firefox fix their behavior, it could cause compat issues. But because Safari has always had the correct behavior, and using emojis on Safari is very popular, and Safari hasn't had any compat issues, that strongly suggests that it won't cause compat issues for Chrome / Firefox.

drwez commented 11 months ago

(1) typing emoji or other non-BMP characters (2) using a browser with broken charCode for non-BMP and (3) into a web-site that is actively processing keypress.charCode to receive characters.

Yes, that applies to a lot of situations. It is very common for websites to use event listeners to monitor comment textboxes. For example, Twitter monitors the textbox so it can update the "maximum characters allowed".

That's true, but that can (and often is) done using events & fields other than keypress and charCode.

it sounds like the single-surrogate DOMStrings are being treated as a complete Unicode code-point, somewhere in rust-dominator

Incorrect, it is not a dominator bug, it is 100% a browser bug. This was already well established. That bug report is what lead hsivonen to file bug reports against the browsers, which then lead to this spec bug.

Yes, I don't think there is any debate that some browsers are currently implementing input incorrectly - resolving that for Chromium is tracked at crbug.com/1450498.

That's a separate issue from keypress, though.

I am well aware of what is going on.

Likewise. :)

and appears to be on Windows, not macOS

Yes, which is exactly what this bug is about: Chrome and Firefox on Windows are incorrectly generating 2 events when they should generate 1 event. Safari correctly generates 1 event.

Again, this bug is specifically about the legacy keypress event, and the charCode field, for which the web platform spec is non-normative. Historically two keypress events have, in the past, in various implementations, been emitted for non-BMP characters - so it can't be said that emitting only one is more (or less) correct. :)

Pauan commented 11 months ago

Again, this bug is specifically about the legacy keypress event, and the charCode field, for which the web platform spec is non-normative. Historically two keypress events have, in the past, in various implementations, been emitted for non-BMP characters

Historically it hasn't always been two keypress. It depends on the browser.

The input and keypress bugs are connected, they're not isolated.

charCode does complicate things a bit, but since Safari has always produced 1 event, and IE / EdgeHTML also produce 1 event, that makes things easier.

When browsers disagree, that makes it easier for the browsers to choose the correct behavior, because there is less concern about compat issues.

That has happened many times in the past, where browsers disagreed on the behavior, and so it was easy to align all of the browsers to the correct behavior.

so it can't be said that emitting only one is more (or less) correct. :)

No, the correct behavior is obviously to have 1 event. The only reason for having 2 events is for historical compat reasons.

That's why we're discussing the probability of compat issues. If the probability is low, then perhaps the browsers can just fix the bug. That has happened before.

You claim the probability of compat issues is high, because non-BMP characters are rarely used. But as I said in my earlier post, that's not true, because emojis are non-BMP and they're commonly used.

drwez commented 11 months ago

The input and keypress issues are related, but different:

input is defined by this spec, and its specified behaviour already matches the one-event model that will address the bug you linked - so no spec changes are required, only fixes by the browser vendors.

keypress has a non-normative description in the spec, and is explicitly provided for "historical compat"ibility, with content that pre-dates the input specification.

Again, the fact that emoji are commonly-used doesn't necessarily mean that they are commonly-used in conjunction with web content that happens to also use the charCode field of keypress events to receive them (instead of input/data) - and even in the implementations with broken keypress.charCode or input.data the implementations do end up with non-BMP characters correctly appearing in standard text fields.

Given that the input spec already does what you describe, and that it falls on vendors to apply fixes for that, do you object to having the existing keypress behaviours better-described by the non-normative portion of the spec? If so then could you provide a specific alternative proposal?

Pauan commented 11 months ago

@drwez do you object to having the existing keypress behaviours better-described by the non-normative portion of the spec? If so then could you provide a specific alternative proposal?

Currently the spec doesn't define the behavior of charCode at all, and there are major inconsistencies with charCode in the different browsers. The spec even explicitly says:

In practice, keyCode and charCode are inconsistent across platforms and even the same implementation on different operating systems or using different localizations. This specification does not define values for either keyCode or charCode, or behavior for charCode.

Although it's a legacy API, it's still commonly used, so it's still important for its behavior to be consistent among browsers.

Ideally the spec should be changed so that charCode is more tightly specified (perhaps aligning with Safari), and that charCode should never contain surrogate pairs.

If those spec changes cannot be made (for compat reasons), then we just have to accept that.

So the big question is: how likely are there to be compat issues if Chrome / Firefox align to Safari's behavior? That will decide what sort of spec changes (if any) need to be made.

drwez commented 11 months ago

@drwez do you object to having the existing keypress behaviours better-described by the non-normative portion of the spec? If so then could you provide a specific alternative proposal?

Currently the spec doesn't define the behavior of charCode at all

Are you sure you're looking at the latest draft? https://www.w3.org/TR/uievents/#dom-keyboardevent-charcode has a(n admittedly self-contradictory[1]) description of the common behaviour.

Specifically note the expectation that a charCode is a charCode in the DOMString sense of the term, such that it can be passed to String.fromCharCode() for example.

[1] Which is in part what lead to this spec bug :)

Although it's a legacy API, it's still commonly used, so it's still important for its behavior to be consistent among browsers.

While that would be ideal, what's most important is that its behaviour is consistent with what content has previously been lead/forced to accommodate. Historically the keypress event, and the content of charCode has not been consistent across platforms, even for the same cross-platform browser. It has been common practice for content to accommodate the various different behaviours by checking the browser, platform and/or versions in the User-Agent string.

There has been a push to use e.g. the presence or absence of fields to detect what's needed (e.g. this was the case historically with which vs keyCode etc) but there have also been behaviours that are harder to accommodate that way (e.g. keyCode used Windows-style VKEYs under IE, and WebKit-based browsers on all platforms, but a different system under Firefox on Linux, IIRC).

Ideally the spec should be changed so that charCode is more tightly specified (perhaps aligning with Safari), and that charCode should never contain surrogate pairs.

If those spec changes cannot be made (for compat reasons), then we just have to accept that.

Right; the spec cannot mandate any particular behaviour, in general, since this is a legacy compatibility event.

The spec could recommend a behaviour, if an implementation is free of compatibility concerns.

So the big question is: how likely are there to be compat issues if Chrome / Firefox align to Safari's behavior? That will decide what sort of spec changes (if any) need to be made.

Right; enumerating the four main implementations we have:

Don't emit keypress at all for non-BMP characters, only input.
- Reasonable given that legacy content typically pre-dates non-BMP being relevant, and new content should be migrating to input anyway.
- Would break content relying on two-keypress w/ charCode holding the code-units, as under Chrome on Windows, and Firefox across all(?) platforms.
- Would break content expecting keypress with charCode containing a character code-point, as is possible with Safari, IIRC.
Emit two keypress events, with charCode set to each of the surrogates.
- Awkward to implement for non-Windows platforms.
- Breaks compatibility for "whole character" charCode browsers, like Safari.
- Does not line-up well with input (e.g. what happens if one or other keypress is cancelled?).
Emit a single keypress with charCode set to a whole character.
- Breaks content that assumes that charCode can be e.g. passed to String.fromCharCode().
- Only content designed for Safari would presumably already have arranged to be compatible?
Emit a single keypress with charCode set to one of the surrogates.
- charCode is not useful, and content that uses it will not work correctly, including stuff that previously would have worked under Chrome/Windows.
- Implies that content should use the key field to get the actual character data, which would be strange, since if content is going to be updated, it should be updated to use input.

To make another specific proposal, I'd suggest the spec: [0. Continue to try to get input fixed in the implementations!]

Acknowledge options 1, 2 and 3 as known implementations, preferably with some context to help folks understand how common to expect them to be.
Recommend one of the options as preferred for implementations with no unique compat considerations of their own. Maybe also discourage option 4.
Provide a code-snippet to show how to patch-up existing fromCharCode() using content to work with both options 2 and 3.