AT Automation API Roadmap

zcorpan commented 2 years ago

This is a proposed roadmap of milestones for the AT Automation API specification (see https://github.com/w3c/aria-at-automation#proposal-specify-a-new-service-to-compliment-webdriver )

The relative order of the milestones below are somewhat arbitrary, and some could be rearranged or happen in parallel. Any dependencies on other milestones are documented. Security considerations for each milestone is also documented.

[x] Milestone 0: Protocol #19
[x] Milestone 1: Settings #22
[x] Milestone 2: Capture output #25
[x] Milestone 3: Keypresses #26
[ ] Milestone 4: Activate commands
[ ] Milestone 5: Internal state
[ ] Milestone 6: Headless mode

MVP is milestones 0 through 3.

Milestone 0: Protocol

Design an architecture, API shape, protocol.

security

opt in to API
use an existing widely supported network protocol (e.g. WebSocket, like WebDriver BiDi)

Milestone 1: Settings

Vendor-specific settings (also see #16)

security

opt in to API

Milestone 2: Capture output

API to capture spoken output without changing the TTS voice (also see #24)

security

opt in to API
sandbox (e.g. do not capture output when the expected applications do not have focus)

Milestone 3: Keypresses

API to simulate keypresses (also see #12)

security

opt in to API
not HID level simulated keypresses
sandbox (e.g. do not allow sending keypresses when the expected applications do not have focus)
session

Milestone 4: Activate commands

Vendor-specific API to activate commands (also see #12). Example: go to the next heading. At minimum setting "modes" (as used in aria-at).

security

opt in to API
sandbox
session
exclude access to any security-sensitive commands

Straw-person message structure example:

{
  "method": "nvda:activateCommand",
  "params": {
    "command": "change to browse mode"
  }
}

Return Type: EmptyResult

Milestone 5: Internal state

Depends on: milestone 4

New API to expose internal state or information in screen readers that is not directly exposed to users but is still useful for testing purposes, e.g. virtual focus position, mode (interaction mode vs. reading mode). At minimum getting the current "mode" (as used in aria-at)

security

opt in to API
exclude access to any security-sensitive information

Straw-person message structure example:

{
  "method": "nvda:getState",
  "params": {
    "state": "mode"
  }
}

Return Type: TBD

Milestone 6: Headless mode

Depends on: milestone 2

Turn off output to TTS (headless mode) (also see #13)

security

opt in to API
signal to user somehow that SR is active (visual + audio)?

jscholes commented 2 years ago

@zcorpan Thanks for writing this up! Some comments:

enunciate punctuation

This is quite a complex setting, so we'll need to scope out exactly what we want/need here. E.g. different screen readers have different predefined levels, but also some additional customisation on top of that (such as symbols dictionaries in NVDA).

Start reading

I don't know what this command is/would be expected to do. Do you mean starting a say all, to read from the cursor position to the end of the page? Note that we don't currently use that in any ARIA-AT tests.

Move to first status menu in menu bar

Not sure what this refers to. Which menu bar?

Find next/previous misspelled word

We don't currently have any ARIA-AT tests relying on this, and I'm not sure which screen readers even support it in virtual web content. Definitely doesn't seem like a Milestone 4 command to me.

zcorpan commented 2 years ago

enunciate punctuation

This is quite a complex setting, so we'll need to scope out exactly what we want/need here. E.g. different screen readers have different predefined levels, but also some additional customisation on top of that (such as symbols dictionaries in NVDA).

OK.

Start reading

I don't know what this command is/would be expected to do. Do you mean starting a say all, to read from the cursor position to the end of the page? Note that we don't currently use that in any ARIA-AT tests.

I believe that's what the command does, yes. I don't know if we need it for aria-at, though it might be useful for more general testing of websites or web apps.

Move to first status menu in menu bar

Not sure what this refers to. Which menu bar?

I'm not sure. It doesn't seem relevant for testing web content, so I'll remove it from the list.

Find next/previous misspelled word

We don't currently have any ARIA-AT tests relying on this, and I'm not sure which screen readers even support it in virtual web content. Definitely doesn't seem like a Milestone 4 command to me.

Indeed, I'll remove it.

Thanks!

mfairchild365 commented 2 years ago

For Milestone 4, I think we are missing Navigate to the previous element.

zcorpan commented 2 years ago

I've edited the milestones in OP to reflect our current thinking. In particular:

Milestone 1, settings, are now vendor-specific and can include all settings (except any to exclude for security reasons)
Milestone 4, activate commands, are similarly vendor-specific
Removed milestones 6 and 7 (previously "more settings" and "more commands")
Milestones 0 through 3 should represent a good MVP

zcorpan commented 2 years ago

Based on our conversation in the CG meeting yesterday (minutes), I think we should make the following adjustments to the roadmap:

Milestone 4: Activate commands
Milestone 5: Headless mode
Milestone 6: Internal state

becomes

Milestone 4: Activate commands - vendor-specific commands, at minimum setting "modes" (as used in aria-at)
Milestone 5: Internal state - expose vendor-specific state, at minimum getting the current "mode" (as used in aria-at)
Milestone 6: Headless mode

w3c / at-driver