Open s3ththompson opened 3 years ago
@jscholes This is for an issue that you originally surfaced for the automation prototype. I believe your concern was that overriding the configuration of the current screen reader to use a different voice could cause confusion and unintentional issues for end-users if, for example, the tests failed and our automation script was unable to return the screen reader to the previous settings for whatever reason. @jugglinmike put together some potential solutions to this and we discussed them on our call today. Minutes are located at: https://www.w3.org/2021/08/30-aria-at-minutes.html
Our proposal is to move forward with the "automation voice + automated toggling" technique for Windows (and potentially other operating systems). Additionally, we can potentially mitigate issues by warning users ahead of time that screen reader settings will be automatically configured, providing some sort of audible update of progress as tests are executing, providing a mechanism to abort the tests and return to previous settings, and instruct the user how to recover if there is a catastrophic failure and configuration can't be automatically restored.
We would like your feedback on this approach and the other approaches that are listed.
Just to clarify @jscholes, as @mfairchild365 mentioned, we discussed the merits of "automation voice + automated toggling" from a technical perspective, but we wanted to get your input as well as that of other non-sighted users before determining what was acceptable from an accessible experience design perspective.
In particular, before we proceed, I'd like to clarify: whether you are aware of other "Potential approaches" we should consider and whether you can share more about potential safety concerns with disabling and re-enabling a screen reader.
I think it's important to determine, and distinguish between, our desired use cases here.
I initially raised accessibility/inclusivity concerns in relation to: a blind user, relying on a screen reader, wanting to contribute to the ARIA-AT project by helping out with development of the automation stack. That would still seem to be the primary thrust of this thread, given its title.
However, there are some parts of the described behaviour here that could be implying additional scope. Specifically, testers actually utilising one or more parts of the automation stack while running tests. On previous CG meetings we've discussed the idea of automatically gathering speech output from a screen reader, for example, instead of relying on testers manually gathering and pasting it into the results form.
I think this is an important distinction, because what is acceptable for one audience isn't necessarily feasible for the other:
This really leads me onto the two takeaways/questions that I want to end with:
With my developer hat on, I don't know why this approach is considered to be easier than the "Automation voice + forward to built-in voice" one, and I'd love to discuss it in more detail. We're potentially talking about:
This is in comparison to some calls to a built-in SAPI5 voice, which many programs implement without fanfare. I'm sure I'm missing something, which is why I want us to have a discussion about it. The system will only be active for a short time, it doesn't need to be perfect. It just needs to talk. Rate and such can be configured in the OS settings.
I don't want to unnecessarily block progress here, or suggest that certain things are simple if they are in fact anything but. But I do want us to talk about it, because right now the "Automation voice + forward to built-in voice" row of the table entirely consists of questionmarks. It seems like we can at least make progress on resolving that, even if we end up flipping some of them to "No". And given the sheer number of programs out there that already output to SAPI, I'd be surprised if we can't flip most of them (macOS aside) to "Yes". The comment for footnote #10
reads:
unknown if Windows' built-in voices can be used in this way
What do we need to do to clear up that unknown? I would be surprised if a SAPI5 engine cannot forward speech onto another one; this is very similar to how the SAPI5 version of ETI Eloquence from CodeFactory works. Granted, they're forwarding speech onto another DLL, not a secondary SAPI5 voice. But as long as we don't feed speech from our own engine back into itself, it should be fine.
CC @sinabahram
Those are very valid points. @jugglinmike and @s3ththompson - what would it take to research the other options further, specifically forwarding to another voice?
Thanks, @jscholes. You're right that there's a lot of uncertainty here, owing largely to my own lack of experience in the domain of Windows programming. The question marks are intended only to document the edges of understanding, not to preclude any particular direction. Transparency is that regard is helpful because it's one of many factors which influence how we proceed (and indeed, who it is that does the proceeding).
Another factor is the usability implications of the alternatives. Your expertise is especially helpful in sussing that out, so thank you!
In the time since posting this issue, I've made some headway toward the alternative named "Automation voice + forward to built-in voice." The way I've integrated with Microsoft SAPI is primitive (you can see for yourself on the main
branch of this repository), but at least the amount of uncertainty has shrunk. On the Bocoup side of things, we're refining the roadmap for this work, so I'm hoping to continue in this direction.
(edited to remove hard line breaks, sorry about that)
@jugglinmike This all sounds great. Thank you for continuing to look into such alternatives. Looking forward to further developments/updates!
The current approach of this AT automation experiment is to create a special "automation voice" that registers as a SAPI 5 voice. Rather than synthesize sound via a text-to-speech engine, the "automation voice" records the textual content of the vocalization and sends it to a local harness/service that records the output and uses it to assert whether the vocalization matches a particular string.
The "automation voice" is unfinished in that it does not synthesize any sounds, thus by definition it does not yet provide an accessible developer experience. This issue raises a number of potential approaches for making the "automation voice" accessible.
from @jugglinmike
Potential approaches
Screen reader + screen reader
Run the screen reader under test alongside the user's screen reader of choice
Screen reader + screen reader in VM
Run the screen reader under test inside a virtual machine
Screen reader + plugin to retrieve speech data
Integrate with each screen reader's proprietary interface for discerning what it's vocalizing (this may not be available in every screen reader)
Automation voice + automated toggling
Automatically reconfigure the system screen reader prior to executing tests, and restore the original configuration at the tests' completion; demonstrated by this prototype
Automation voice + ability to vocalize
In theory, this prototype could use an open-source C++ library to enunciate words in addition to providing it as text data to the test runner
Automation voice + forward to built-in voice
In theory, this prototype could use the operating system's built-in voices to enunciate words in addition to providing it as text data to the test runner
AssitivLabs
Fellow stakeholder Weston Thayer is building a service which maintains web browsers and screen readers internally and allows clients to visit their own web pages using them; assistivlabs.com
Feasibility
No one approach is known to be suitable for all of the screen readers we intend to support. The following matrix documents our current understanding of what's possible (signified by "yes"), what's not possible (signified by "no"), and what is currently unknown (signified by "?"). Subsequent annotations elaborate on these qualifications.