Open zcorpan opened 2 years ago
@zcorpan Thanks for writing this up! Some comments:
enunciate punctuation
This is quite a complex setting, so we'll need to scope out exactly what we want/need here. E.g. different screen readers have different predefined levels, but also some additional customisation on top of that (such as symbols dictionaries in NVDA).
Start reading
I don't know what this command is/would be expected to do. Do you mean starting a say all, to read from the cursor position to the end of the page? Note that we don't currently use that in any ARIA-AT tests.
Move to first status menu in menu bar
Not sure what this refers to. Which menu bar?
Find next/previous misspelled word
We don't currently have any ARIA-AT tests relying on this, and I'm not sure which screen readers even support it in virtual web content. Definitely doesn't seem like a Milestone 4 command to me.
enunciate punctuation
This is quite a complex setting, so we'll need to scope out exactly what we want/need here. E.g. different screen readers have different predefined levels, but also some additional customisation on top of that (such as symbols dictionaries in NVDA).
OK.
Start reading
I don't know what this command is/would be expected to do. Do you mean starting a say all, to read from the cursor position to the end of the page? Note that we don't currently use that in any ARIA-AT tests.
I believe that's what the command does, yes. I don't know if we need it for aria-at, though it might be useful for more general testing of websites or web apps.
Move to first status menu in menu bar
Not sure what this refers to. Which menu bar?
I'm not sure. It doesn't seem relevant for testing web content, so I'll remove it from the list.
Find next/previous misspelled word
We don't currently have any ARIA-AT tests relying on this, and I'm not sure which screen readers even support it in virtual web content. Definitely doesn't seem like a Milestone 4 command to me.
Indeed, I'll remove it.
Thanks!
For Milestone 4, I think we are missing Navigate to the previous element.
I've edited the milestones in OP to reflect our current thinking. In particular:
Based on our conversation in the CG meeting yesterday (minutes), I think we should make the following adjustments to the roadmap:
becomes
This is a proposed roadmap of milestones for the AT Automation API specification (see https://github.com/w3c/aria-at-automation#proposal-specify-a-new-service-to-compliment-webdriver )
The relative order of the milestones below are somewhat arbitrary, and some could be rearranged or happen in parallel. Any dependencies on other milestones are documented. Security considerations for each milestone is also documented.
MVP is milestones 0 through 3.
Milestone 0: Protocol
Design an architecture, API shape, protocol.
security
Milestone 1: Settings
Vendor-specific settings (also see #16)
security
Milestone 2: Capture output
API to capture spoken output without changing the TTS voice (also see #24)
security
opt in to API
sandbox (e.g. do not capture output when the expected applications do not have focus)
Milestone 3: Keypresses
API to simulate keypresses (also see #12)
security
opt in to API
not HID level simulated keypresses
sandbox (e.g. do not allow sending keypresses when the expected applications do not have focus)
session
Milestone 4: Activate commands
Vendor-specific API to activate commands (also see #12). Example: go to the next heading. At minimum setting "modes" (as used in aria-at).
security
opt in to API
sandbox
session
exclude access to any security-sensitive commands
Straw-person message structure example:
Return Type:
EmptyResult
Milestone 5: Internal state
Depends on: milestone 4
New API to expose internal state or information in screen readers that is not directly exposed to users but is still useful for testing purposes, e.g. virtual focus position, mode (interaction mode vs. reading mode). At minimum getting the current "mode" (as used in aria-at)
security
Straw-person message structure example:
Return Type: TBD
Milestone 6: Headless mode
Depends on: milestone 2
Turn off output to TTS (headless mode) (also see #13)
security
opt in to API
signal to user somehow that SR is active (visual + audio)?