IME support in actions - Githubissues

jgraham commented 2 years ago

WebDriver is currently unable to simulate the action of an IME in user input.

These are widely used, particuarly when inputting scripts where there are far more available characters than keys on a keyboard. That means it's impossible to use WebDriver to adequetely test how web applications behave when inputting these scripts. The behaviour in the face of IME input is also an interopability problem for web browsers, and fixing this is seen as a high priority area for the web.

Conceptually an IME sits between the physical input layer and the application. Typically the IME is activated with some device input e.g. pressing a key on the keyboard. Once triggered the IME generates a candidate composed string that may be updated based on further input, and is at some point committed. During this time, the composed string is typically displayed in application, but styled in a way that makes it distinct from the final input. There is also typically IME-specific UI to suggest different possible completions, but this is quite platform-specific and will be considered out of scope.

In WebDriver the low-level input handling is done through the actions API. This models user input as a set of virtual input devices, which each have an internal state. At each point in time ("tick") an input device can either do nothing ("pause"), or can have an associated action that updates its internal state and causes the relevant events to be emitted to content (e.g. a keyDown action on a key input device will update the internal WebDriver state to signify that the key is depressed, and emit a keydown event to content).

For a given input on a given device the IME can do nothing (i.e. just let the event pass through) or can intercept the event, update its internal state, and cause different events to be emitted instead. For example, consider pressing the "a" key. In the absence of an IME this will cause a keydown event with keyCode 65, a keypress event, possibly various input events, and finally a keyup event also with keyCode 65. However if the IME is activated, we get a keydown event with keycode 229, a compositionstart event, a compositionupdate event with data corresponding to the current IME input selection, input events, and finally a keyup event with keycode 65. Note in this example that the content never sees a keydown event with keycode 65: the fact that the IME intercepted the event changes the key events visible on the page.

Later an input (or something not visible to the web page) may cause the composition to be committed, which corresponds to a compositionend event.

IMEs can apply to non key input e.g. handwriting recognition is a form of IME that depends on pointer input. It may also depend on multiple kinds of input

In terms of the implementation inside the WebDriver spec, the obvious thing would be to add IME as a new kind of input source for actions. However, the fact that it's a layer between the "physical" input devices and the application makes this more complex; to handle cases like "key is pressed and intercepted by IME, other events happen, key is released" we need to a) specify which other inputs in a given tick are being intercepted by the IME and b) Handle the IME-generated state changes after all other inputs (maybe even right at the end of the tick: for something like pointerMove which can be spread out into multiple events over time it's not clear how things should work).

So a possible proposal is as follows:

We add a new input source type ime. That has internal state which is the current composition string.

The ime input source has two assocaited actions: compositionUpdate and compositionEnd.

compositionUpdate is the main action for updating the composed string. It has the following properties:

data - A string containing the updated value of the composition string. If this is null (or the empty string?) we end the composition.
clauses (optional) - These represent sub-parts of the composition string. Each clause has a length and a type. The lengths must add up to the total length of data. Suggested value of type are “caret”, “rawInput”, “converted”, “notConverted”, “targetConverted” (TODO: clarify the semantics of these). In addition formatting hints may be specified accorind to how the IME would like the range to be handled. These are underlineColor, underlineStyle, backgroundColor, textColor. If this is ommitted it's assumed that there's a single clause (TODO: details)
handles (optional) - The input source id for the input source that caused this change in the IME state. If this is provided the internal state of the referenced input source is updated, but the DOM events emitted are those appropriate to the IME instead (e.g. for a keyboard the keyCode property becomes 229). If this property is omitted the update to the IME state is not connected to any application-visible input source change (this corresponds to the situation where e.g. the user clicks on a composition string option in a window outside their browser window).

A compositionEnd action causes the composed string to be emitted has the following properties:
- data (optional) - The final composed string to insert. If omitted this is given by the data property of the previous compositionupdate action.
- handles(optional) - As forcompositionupdate`, if committing the composition happens in response to a content-visible input action, this is a reference to the device id for that action.

An example of what it looks like on the wire when we press "a" on the keyboard, it generates a composed string "abc", it gets updated to "ABC" by something outside the browser, and it's committed with the space key:

{"actions": [
    {"type": "key",
     "id": "keyboard-1",
     "actions": [
       {"type": "keyDown",
        "value": "a"},
       {"type": "keyUp",
        "value": "a"},
       {"type": "pause"},
       {"type": "keyDown",
        "value": " "},
       {"type": "keyUp",
        "value": " "},
     ]
    } ,
    {"type": "ime",
     "id": "ime-1",
     "actions": [
       {"type": "compositionUpdate",
        "handles": "keyboard-1",
        "data": "abc",
       },
       {"type": "pause"},
       {"type": "compositionUpdate"
        "data": "ABC"},
       {"type": "compositionEnd",
        "handles": "keyboard-1"},
       {"type": "pause"},
     ]
    } 
  ]
}

jgraham commented 2 years ago

More detail on clauses: characters in the IME data can either be user input or can be some kind of IME output (i.e. a converted character). These are typically displayed differently in the editor. So APIs can either provide the semantic information about which kind of text it is, and use that to set the style, or can just provide styling information for each character. In either case the user uses the displayed text style to interpret the composition string.

The documentation for the win32 API has some details here: https://docs.microsoft.com/en-us/windows/win32/intl/composition-string and the proposed values here largely match those in that API.

css-meeting-bot commented 2 years ago

The Browser Testing and Tools Working Group just discussed Actions IME support.

The full IRC log of that discussion

<AutomatedTester> topic: Actions IME support
<AutomatedTester> github https://github.com/w3c/webdriver/issues/1683
<AutomatedTester> github: https://github.com/w3c/webdriver/issues/1683
<AutomatedTester> jgraham (IRC): This is also relevant to webdriver classic
<AutomatedTester> ... the issue here is a proposal on how to handle ime input in webdriver
<AutomatedTester> ... IME is input method editor
<AutomatedTester> ... it is commonly used in languages where you can't type the characters directly
<AutomatedTester> ... [describes examples]
<AutomatedTester> ... there are a lot of web compat issues in editor libraries because they can't test IME
<AutomatedTester> ... [describes input breakage in Gecko]
<AutomatedTester> ... for those who have heard of Interop 22... part of that is working on interop in input
<AutomatedTester> ... in webdriver, the lowest level inputs is actions that allows you to send through the keyboard, pointer events and so on
<AutomatedTester> ... with IME you press a key and that intercepts and a different event is fired. e.g. A would change it to the keycode and then do composition
<AutomatedTester> ... the webpage gets composition events
<AutomatedTester> ... [explains different composition methods]
<AutomatedTester> ... the proposal is we add a new input type called IME
<AutomatedTester> ... this has 2 actions, `compositionUpdate`
<AutomatedTester> ... the other action is `compositionEnd`
<AutomatedTester> ... so the webdriver specific thing that's not clear how these things hook together
<AutomatedTester> ... [explains IME and Keyboard]
<AutomatedTester> q+
<jgraham> q?
<BrandonWalderman> q+
<jgraham> ack
<jgraham> AutomatedTester: Historically WebDriver (Selenium) had IME support built in. It was handled by actions trying to inject directly into the event queue. There was special C++ code required to handle it. That's why we didn't do this and focused on US keyboard input. We did allow actions to handle sending specific unicode characters so you could input final composed characters.
<jgraham> AutomatedTester: Required specific install on the machine.
<jgraham> AutomatedTester: Is it easier to implement now?
<jgraham> AutomatedTester: High level actions seem OK, but is it implementaale?
<AutomatedTester> ack next
<jgraham> s/aa/ab/
<AutomatedTester> jgraham (IRC): this is a case benefits being supported directly in the browser
<AutomatedTester> ... the proposal is it is at the moment... it's a mid layer proposal
<AutomatedTester> ... we won't go to the OS IME
<AutomatedTester> ... we will provide enough data to the browser so it could inject the relevant events
<AutomatedTester> ... this should be implementable and it can be implemented in gecko
<LanWei> q+
<AutomatedTester> ... [explains how we need to maintain some states]
<AutomatedTester> ack next
<AutomatedTester> Brandon Walderman: I support this feature request
<AutomatedTester> ... we had an intern do some of this in Chromium for CDP
<AutomatedTester> ... the building blocks are already in chromium so it's a case of adding this to chromedriver
<AutomatedTester> ack next
<AutomatedTester> Lan Wei: I was working on the actions implementation
<AutomatedTester> ... we have looked at this and it's very hard to implement
<AutomatedTester> ... could you explain the client API
<AutomatedTester> jgraham (IRC): so from the point of view of webdriver user
<AutomatedTester> ... it doesn't ever interact with an IME on the machine
<AutomatedTester> ... we will emulate it
<karlcow> q+ to ask about gecko on different platforms
<AutomatedTester> Lan Wei: do you have language type as an input
<AutomatedTester> jgraham (IRC): the proposal doesnt have a way to handle any configurations... e.g. different IMEs handle different combinations to get a different order of events
<AutomatedTester> Lan Wei: do you have any plan on when we want to work on this API?
<AutomatedTester> jgraham (IRC): since this is part of Interop 2022, there is pressure to get this done quickly
<AutomatedTester> ... we would love feedback now
<AutomatedTester> q?
<AutomatedTester> ack next
<Zakim> karlcow, you wanted to ask about gecko on different platforms
<AutomatedTester> karlcow (IRC): I wanted to ask jgraham (IRC) ...do we need a different test per platform?
<AutomatedTester> jgraham (IRC): if platform IMEs handle things different then a test per platform?
<AutomatedTester> karlcow (IRC): how do you make this universal?
<AutomatedTester> jgraham (IRC): this is very hard...
<AutomatedTester> ... it won't adress all cases but it's an improvement since we have zero way to test
<AutomatedTester> q?
<AutomatedTester> Break for 15 minutes

karlcow commented 2 years ago

A list of test scenarios which this proposal is trying to solve will help to understand the scope. IME testing can be very large, but maybe knowing the minimum viable context, it would make it easier to evaluate the effort required for implementing this.

jgraham commented 2 years ago

Not directly addressing @karlcow's request yet, but some additional context:

This proposal is intended to work at the level of providing a "virtual IME" whose state is entirely under the control of the test author. Therefore the intent is roughly that if we imagine a general data flow of the form physical input device → OS input handling → IME → browser application, this replaces everything to the left of "browser application" with "virtual IME". So the maximum amount of flexibility we could aim for is to be able to replicate any possible sequence of IME messages/events that the browser could get from the operating system. In practice of course this API is not OS-specific, somewhat higher level, and so it's not reasonable to expect to be able to simulate every possible case in the browser IME handling. But one way to judge whether we're likely to meet the testing requirements for webapps is to verify whether there are any codepaths on the browser side that are commonly triggered by real IMEs but could not be triggered by this API.

What is definitively out of scope is being able to invoke specific real IMEs, or simulate input at the OS/hardware level. Although those things do have significant advantages in some cases, they require a very different approach from the current WebDriver virtual input handling.

w3c / webdriver

IME support in actions #1683