API based on simulating keypresses vs. invoking discrete commands

zcorpan commented 2 years ago

From the ARIA-AT automation meeting on March 14, 2022: w3c/aria-at-automation#17 (minutes)

A screen reader automation API could allow simulating keypresses, or simulate invoking discrete screen reader commands, or both. Commands may be more robust or useful when you're writing a test directly to perform a task. Commands may be useful as they allow for recording actual user input.

Is perfect accuracy possible or even desirable?
How much of the keypress handling is even within the domain of the AT to control/simulate?
How do we support alternative gestures or input devices?

cc @cookiecrook @mcking65 @s3ththompson @aleventhal

aleventhal commented 2 years ago

Good Q's. I don't have all the answers, but wanted to make sure we record the use case here.

Use case: development of an input & output recording tool that can be used by expert screen reader users to develop tests, without requiring programming skills. This could vastly increase the number of tests we can create.

zcorpan commented 1 year ago

@cookiecrook said in our previous meeting that simulating keypresses in a way that VO acts on them is not possible on macOS for security reasons.

But followed up in email saying that for non-sandboxed, unsigned executables, it is possible with AppleScript.

So, what does this mean for the AT Driver spec? Should we not include a keypress API?

cookiecrook commented 1 year ago

To clarify, I mentioned Apple had no near-term plans to ship a keypress API for VO through a supported, secure means such as XCTest.

However, Michael Fairchild mentioned @ckundo’s https://github.com/AccessLint/screenreaders project which leverages System Events as the keypress driver. This requires the automation system owner to put the system into a less secure state, which may suffice for the context of ARIA-AT. Though it’s an unsupported method for automating VO, I think it could be a reasonable implementation for your proposed keypress API. HTH. Thanks.

jugglinmike commented 4 days ago

Here and elsewhere, folks have raised security concerns about any automation API that simulates HID-device input. The discussion above considers an API built around "commands" (and elsewhere, "user intents") as a safer alternative. We currently feel such an API could be feasible if it includes a means to fill text into form fields. Here, I'll explain why the capability is necessary and propose a definition which may avoid the risks of HID-level simulation.

The problem with user gestures

An API that is limited to high-level user gestures would be unable to simulate interactions like filling in form fields.

Rejected solution: delegate to WebDriver

It might be possible to circumvent this deficiency using WebDriver's "element send keys" command (since key presses could be simulated in the browser directly), but only if the browsers were aware of the location of the ATs' virtual cursors at all time.

Unfortunately, this is not the case.

If AT Driver's use-case for "form filling" is to be facilitated by WebDriver, then AT Driver would need a mechanism for conveying the target element to WebDriver. We feel that a coupling like that would dramatically increase the complexity of AT Driver and decrease its likelihood of implementation.

Proposed solution: "send text" command

Instead, we propose a command which allows clients to specify a sequence of characters to be entered into the currently-focused form field.

While this solution has similarities to the original HID-simulation approach, its differences preclude malicious applications without impacting the desirable use-case:

Control characters (e.g. Alt, Command, Shift, or Meta) could be sent
The implementation could reject presses at its discretion (e.g. if the target form field did not belong to an accredited process such as a web browser)

@cookiecrook, you’ve represented Apple's security concerns on this issue over the past few years. Could you weigh in on whether the API I've sketched out above would pass muster?

w3c / at-driver