w3c / aria-at-automation-driver

A WebSocket server which allows clients to observe the text enunciated by a screen reader and to simulate user input
Other
10 stars 6 forks source link

Potential SAPI5 API to check whether speech has been cancelled? #2

Open WestonThayer opened 2 years ago

WestonThayer commented 2 years ago

Hey @jugglinmike, I was poking around ISpTTSEngine::Speak today and noticed the last arg (ISpTTSEngineSite *pOutputSite) might provide a way to know whether NVDA has "cancelled" speech (called ISpVoice::Speak with SPF_PURGEBEFORESPEAK, relevant line). ISpTTSEngineSite::GetActions provides SPVES_ABORT and SPVES_SKIP flags. I noticed the Windows-classic sample checks this frequently while outputting speech.

Sorry I haven't tested it myself yet, working on getting the dev env set up. But thought you might want to know!

WestonThayer commented 2 years ago

Ah, via the SAPI5 porting guide:

Speak is the main function of the interface - it passes the engine the text to be rendered, an output format to render it in, and an output site to which the engine should write audio data and events. A Speak call should return when either all of the input text has been rendered, or the engine has been told to abort the call by the SpVoice object.

jugglinmike commented 2 years ago

Hey, @WestonThayer, thanks for the tip! I discovered that feature last week and only just completed a proof-of-concept about an hour ago. The relevant code is here.

Unfortunately, this doesn't address the problem we discussed during our call yesterday. That's because the voice is only able to detect cancellation if it is actively speaking (or in more technical terms: if it is blocking the thread which invoked Speak).

Put another way: we can't label the "urgency" of a given vocalization deterministically; we can only observe whether interruption occurs in practice.

One way to handle this might be to insert an artificial delay of arbitrary duration after a vocalization. We'd essentially be creating a window of time where an automated test could perceive interruption.

This solution has some drawbacks, though. First, inserting a delay like this will increase overall test duration. Second, there's some risk of false negatives where interruption occurs after they delay we implement (this system wouldn't recognize such a vocalization as an interruption at all).

Due to this, it's not clear if the testing system should perform this "interruption verification" step as a matter of course for every vocalization, or if it should be only be applied in specific circumstances (for instance, by implementing an explicit assertion that test authors use just in cases where interruption/non-interruption is especially important).

I think it would be easier to answer this question if we had a sense for the likelihood of errors in "interruptiveness."

WestonThayer commented 2 years ago

Awesome! BTW, how did you decide that the Bookmark callback should be implemented? Was it causing a specific issue?

For the false negative issue, I imagine that could be solved through careful orchestration between the test runner and SAPI driver. I agree it's probably not necessary to have all the time, but if we wanted to test an assertive live region, I imagine we could have complete control in the test runner like so:

// Open a test page with a button that triggers an assertive live region alert

// First, the SR needs to be reading something else, so read a heading on the page.
// But first prime the SAPI driver to "speak" until we tell it to stop, putting the SR "on hold"
await speechDriver.setSpeechHold();

// Now move to the heading and wait for SR to have called ::Speak()
keyboard.press("h");
await speechDriver.waitForSpeechStart("Heading Lorem Ipsum");

// OK, SAPI is "speaking", trigger live region (call browser directly, no need to
// involve SR
await browser.findElement("#live-region-btn").click();

// Make sure speech was cancelled. If SAPI driver receives the abort signal, it automatically releases the speech hold we set above
await speechDriver.waitForSpeechAbort();

// Now just make sure we get new speech from the assertive live region
await speechDriver.waitForSpeechStart("live region");
jugglinmike commented 2 years ago

Awesome! BTW, how did you decide that the Bookmark callback should be implemented? Was it causing a specific issue?

Speak may be invoked with any number of discrete strings of text, interleaved with instructions to send the operating system a "bookmark" signal. In Microsoft's SAPI, the bookmark is the mechanism for a voice to communicate its progress through the text. That's how the OS knows when to send along queued-up vocalizations, such as those from "polite" aria-live regions. That's true even when the "series" of strings provided to Speak has just one item. Before I implemented that part of the protocol, SAPI was not sending the voice any text from aria-live regions.

Thanks for asking--that was a hard-won discovery :)

For the false negative issue, I imagine that could be solved through careful orchestration between the test runner and SAPI driver. I agree it's probably not necessary to have all the time, but if we wanted to test an assertive live region, I imagine we could have complete control in the test runner like so:

That heuristic seems like a good way to assert that interruption occurs. Verifying that interruption does not occur seems fundamentally imprecise to me since we'd technically have to wait until the end of time to make a definitive statement. Probably we'll just choose some time limit for practicality's sake. (Internally at Bocoup, we've discussed how if the voice got through all of Melville's Moby-Dick, we could be reasonably certain that the screen reader wasn't going to interrupt it.)

WestonThayer commented 2 years ago

Hah, if there was a 🐳 reacji, I'd use it!

jugglinmike commented 2 years ago

Some people search their whole lives for the white :whale: reacji