w3c / uievents

UI Events
https://w3c.github.io/uievents/
Other
147 stars 52 forks source link

no way to distinguish printable key values from non-printable #264

Open Yaffle opened 4 years ago

Yaffle commented 4 years ago

event.key - is it printable? On the internet it is said to use the hardcoded list of special values for this purpose - https://www.aarongreenlee.com/blog/list-of-non-printable-keys-for-keyboard-events-when-using-event-key/ .

juj commented 2 years ago

Thanks for filing this issue @Yaffle . We have been struggling with this problem in the past at Unity as well.

Currently there is no specified way to identify whether a given KeyboardEvent.key field should be a printable character or a non-printable control code character.

This is causing repeated bug reports about international IME handling to Unity WebGL, where UI input elements are implemented via rendering in WebGL (e.g. as part of rendered game content), and not using DOM input elements. Any other WebGL sites that need to do international keyboard input likely face the exact same issue.

Reading our git log, we seem to have numerous partial patches go in the tree as result from international bug reports, and struggling to get the heuristics "quite right".

Searching the web, people offer somewhat random and hacky(?) solutions to the issue, e.g. event.key is printable if event.key.length == 1.

However this does not seem to work in the presence of international UTF-16 surrogate pair characters beyond Basic Multilingual Plane. Also the question of more complex composable characters comes to mind as a possible problem?

So far we are looking to decode the given key as a UTF-32 sequence, and test if it is a single UTF-32 glyph, and assume the key is printable if so, but it is unclear if that will be guaranteed.

Also there are reports about odd behavior on mobile devices: https://stackoverflow.com/a/70866532 that ask developers to observe Unidentified as a string. Not completely sure what that is about, but that goes to show that the spec has a missing spot today that requires people to basically "test your code on all devices and IME languages" for a thing that could be a "spec-solved" problem.

Another "solution" that was posted in @Yaffle's comment above suggests to explicitly blacklist all the current spec defined control sequence strings. However that is also not a good solution, because it will grow the size of the delivered web site code by quite a bit, and if/when new control sequences are introduced to the spec in the future, it will break the existing web pages.

So in summary, it seems that there would certainly be room for the KeyboardEvent spec to introduce a KeyboardEvent.isPrintable field or a KeyboardEvent.isControlSequence field that will unambiguously and clearly distinguish between whether a given keyboard event is supposed to be printable inputted text, or one of the browser-generated nonprintable control sequences.

In addition to that, it would be great to get guidance from current browser vendors as to what is the best compatible and future proof code sequence to implement this today, until the spec adds such a field?

juj commented 2 years ago

So far we are looking to decode the given key as a UTF-32 sequence, and test if it is a single UTF-32 glyph, and assume the key is printable if so, but it is unclear if that will be guaranteed.

More specifically, what we are looking at is the following:

function isPrintableKeyEvent(event) {
  var codeUnit = event.key.charCodeAt(0);
  var isLeadSurrogate = (codeUnit >= 0xD800 && codeUnit <= 0xDFFF);
  return event.key.length == 1 || (event.key.length == 2 && isLeadSurrogate);
}

but it would be great to get advice from Google and Mozilla teams whether that will be guaranteed to capture all printable keyboard events? (it will certainly leave out all the current non-printable control sequences, and it is unlikely that any one UTF-32 glyph code control sequences will ever exist?)

laughinghan commented 4 months ago

@juj I think that's not right, the spec mentions the possibility of multiple Unicode code points if there are composing characters. But wouldn't a simpler algorithm be: if .key is ASCII-only and >1 character, it's non-printable, otherwise (if it's exactly 1 character or it has any non-ASCII characters) then it's printable? Or in code, /^[\x00-\x7F]{2,}$/.test(event.key)?

Granted, wouldn't hurt to mention this invariant as a non-normative note in the spec