Create our own Screen Reader

samreid commented 8 years ago

From https://github.com/phetsims/scenery-phet/issues/227

From phetsims/balloons-and-static-electricity#134 we would like to display accessible text on the screen (possibly outside of the sim iframe), we will need to be able to navigate a virtual cursor across non-focusable (but accessible) items.

@jessegreenberg and I decided to work on this in master since (a) we don't know how long this code will live and (b) to simplify branch management, since this could have changes in joist, scenery and balloons-and-static-electricity (the latter of which is already in a branch.)

jessegreenberg commented 8 years ago

This is very much a prototype, but should serve to show the potential for this kind of tool. Here is a deployed version: http://www.colorado.edu/physics/phet/dev/html/balloons-and-static-electricity/1.2.0-accessible-instance.13/screen-reader.html

@samreid also included the Web Speech API! Try it here :sound:: http://www.colorado.edu/physics/phet/dev/html/balloons-and-static-electricity/1.2.0-accessible-instance.13/screen-reader.html?speech

I will wait to work on this further until more details are discussed/reviewed with the team. Initial questions:

Who will use this? Translators? PhET for presentation purposes? This will partially determine the verbosity of screen reader specific labels.
Design/layout considerations such as structuring groups of descriptions for aria-labelledby/describedby to cue context.

jessegreenberg commented 8 years ago

The Web Speech API is pretty great, much more powerful than I initially thought. What if we created a prototype screen reader with the web speech API that could be used as an alternative to other Assistive technologies? The prototype could potentially behave much like the version in https://github.com/phetsims/scenery/issues/538#issuecomment-201066178 but with additional features and navigation strategies so that it behaves much like other screen readers for the web. There are downsides to this, but the benefits could be remarkable for Web Accessibility.

Benefits

The reader would work directly with the Parallel DOM, exactly like any other screen reader
- It could be turned off at any time if the user would rather use another AT
Could work exactly like a screen reader but better in the context of PhET sims
Potentially gives us leverage with other accessibility groups if we determine new features or standards that would be beneficial for accessible interactives
Allows us to pursue testing on a much more reliable platform while we discuss infrastructure issues with other accessibility groups
We can listen this reader output directly so the output text can be manipulated in any way we desire

Disadvantages

This would be a prototype, not a production solution. The Web Speech API is a work in progress and is only supported in Chrome, with some support from Firefox with the right user settings
We need to investigate how well it performs, not yet a guarantee that performance/behavior is sufficient.

samreid commented 8 years ago

caniuse reports that text to speech also works in Safari and Mobile Safari: http://caniuse.com/#feat=speech-synthesis

jessegreenberg commented 8 years ago

Ah, that is even better, thanks! I was going off of the table here, which is likely out of date.

samreid commented 8 years ago

It seems to me that the MDN table is referring to browsers that support both Text to Speech and Speech Recognition, while the caniuse page is referring to browsers that support just the Text to Speech feature.

jessegreenberg commented 8 years ago

That makes sense, here is the table for speech recognition, which matches the MDN table.

http://caniuse.com/#search=speech%20recognition

samreid commented 8 years ago

It occurred to me there may be other 3rd party apps that implement an in-browser screen reader. This came up in a search, there are probably others: http://www.chromevox.com/

We would probably need to instruct that they should not use a native screen reader (such as NVDA, JAWS or Voiceover) with a custom in-browser screen reader (whether we write one or leverage a 3rd party one).

jessegreenberg commented 8 years ago

ChromeVox is great and performs well with PhET sims. I believe that @terracoda did some testing with users of ChromeVox for the web. It does have its share of issues, especially with aria-live.

It would be good to double check some of the web based screen readers, perhaps they would perform well enough for testing. We haven't spent too much time investigating them because they are not very widely used.

Here are a few that seem worth investigating from a quick search:

WebAnywhere
ReciteMe
ReadSpeaker
Spoken Web
Fire Vox
Automatik Text Reader

jessegreenberg commented 8 years ago

We would probably need to instruct that they should not use a native screen reader

Yes, that is true, though some users may be uncomfortable with disabling their reader. I can imagine a feature that allows the user to turn our custom screen reader on and off. We could recommend a screen reader that works really well while allowing it to be turned off so that a native device could be used if necessary.

jessegreenberg commented 8 years ago

WebAnywhere is pretty neat, you can paste a link into the search bar and WebAnywhere will act as a screen reader to read the page. It does take a long time to load more complex content and the synth voice is a bit harsh.

It looks like ReciteMe is actually a commercial group that outfits web pages with accessibility.

jessegreenberg commented 8 years ago

Automatik Text reader is a plugin that works by selecting text to read with a cursor. It is also a bit slow (at least on my machine), it takes >5 seconds before reading a selection of text.

I was unable to get Fire Vox to install :(

jessegreenberg commented 8 years ago

It looks like ReadSpeaker would be used as an API and provides a bit more functionality than the Web Speech API, though it is commercial and likely quite expensive.

jessegreenberg commented 8 years ago

The screen reader with the Web Speech API is coming along and is working very nicely. It performs better in this context than many other screen readers we have used. Next will be to implement additional strategies from sources such as http://webaim.org/resources/shortcuts/nvda or http://webaim.org/resources/shortcuts/jaws. I am curious to hear about additional strategies that the team has observed during user testing, those should probably be prioritized.

We can also begin to think about other important features for the reader such as a way to configure the keys for navigation. For isntance if a user is used to JAWS, they could select a configuration that matches that navigation set. However, most of the strategies implemented at this time are common between all screen readers. That will not be true for the more reader-specific strategies.

This is also a good point to start thinking of how aria-live should work for multiple updates. What should happen when more than one assertive update fires at once?

samreid commented 8 years ago

What should happen when more than one assertive update fires at once?

To answer this, could you present examples of situations with assertive updates? For the general case, you could envision assertive messages like this:

A1 B1 A2 B2 C1 A3 A4

where the letter is the type of message and the number is the message index, and we can imagine for the sake of argument that A1 was still reading when the other messages were produced. Here are some possible rules for dealing with multiple updates:

Queue them and report them all. This could cause a huge (unending) backlog of updates which are very stale.
Ignore messages that happened while an update was playing.
Queue messages, but only report the last of each type. For instance in the above messages, after A1 plays, then B2, C1 and A4 would play.
Generate some kind of summary for queued messages of the same type. For instance, "the speed is 10mph", "the speed is 20 mph" and "the speed is 40mph" could coalesce to "the speed increased from 10mph to 40mph".

Other options would be possible if we have the ability to interrupt a message while it is playing (I'm not sure whether that is technically feasible).

It will be easier to discuss this if we come up with concrete examples from sims.

jessegreenberg commented 8 years ago

Sure, here is one example of the desired output from Capacitor Lab: Basics -

Imagine that the user just connected the charged capacitor to the light bulb:

- (Capacitor disconnected from battery)
- Capacitor connected to light bulb
- Current is flowing
- Light bulb is lit
- Plate charge is 0.50 pico coulombs
- Stored energy is 0.40 pico joules
- Light bulb is less bright
- Plate charge is 0.35 pico coulombs
- Stored energy is 0.10 pico joules
- Light bulb is less bright
- ( ...and so on )
- light bulb is dark
- current is not flowing
- plate charge is 0 pico coulombs
- stored energy is 0 pico joules

(Sonification would really simplify things here!)

Another example would be when the simulation is reset and all of the aria-live content changes at once. From a given state in BASE, we might have an output like

- Wall added to play area
- Green balloon has a neutral charge, no more negative charges than positive ones
- Green balloon is in middle of play area, centered between sweater and wall
- Green Balloon removed from play area
- Yellow balloon is in middle of play area, centered between sweater and wall
- Yellow balloon has a neutral charge, no more negative ones than positive ones
- Sweater has a neutral charge, no more negative ones than positive ones

Right now, the behavior of Reader.js is to add a message to the queue if message is 'polite'. If it is assertive, the active utterance is canceled, all messages are removed from queue, and assertive message is read. This is how most screen readers behave.

So perhaps the issue is that we need to specify priority of 'polite' updates so that we can guarantee the correct order of messages in the queue. This is where screen readers differ, some are LIFO some are FIFO.

My concern with coalescing messages of a similar type is that it might mask information about the rate of in these kinds of updates. That would definitely reduce verbosity, and maybe the sim could handle that if necessary.

jessegreenberg commented 8 years ago

Hmm, maybe we don't need to read everything when the sim is reset and it is assumed that the user remembers the initial state?

jessegreenberg commented 8 years ago

Another question: Now that we can use the arrow keys to interact with a 'role=application' element, should we? The trade-off is that the user can no longer read content with the arrow keys and the only way to leave the 'application' element is with 'tab' or 'shift + tab'.

jessegreenberg commented 8 years ago

After playing around with some of the 'live' roles with dynamic descriptions in Balloons and Static Electricity, @samreid's suggestion

Queue messages, but only report the last of each type. For instance in the above messages, after A1 plays, then B2, C1 and A4 would play

could be a great solution. At the moment, balloon descriptions queue, so as the balloon travels from left to right we have updates like

The yellow balloon is in the bottom of play area, near wall
The yellow balloon is in the lower part of play area, closer to wall than sweater
the yellow balloon is in the middle of play area, sweater is at far left, wall is at far right
the yellow balloon is in the middle of play area, closer to sweater than wall
the yellow balloon is in the middle of play area, near sweater
the yellow balloon is in the middle of sweater

The descriptions can be very stale by the time they are read all the way through. But if the descriptions are fully assertive, they will shut down any other messages (such as the state of the wall, for instance). So if we had a system that queued the last message of each type, that would not be an issue.

jessegreenberg commented 8 years ago

In 5/12/16 meeting, it was mentioned that we should see how much the reader and accessible content impacts the size of the simulation.

jessegreenberg commented 8 years ago

@terracoda created this list of screen reader hotkeys, which shows the common navigation strategies used by NVDA and JAWS, and highlights their conflicts. https://docs.google.com/document/d/1Qqyyfthh3vIZhhtlc1DC78A50ler71035sl9HUgKg4c/edit?ts=573cdfb9#heading=h.2g1hpfksyljt

At some point we will probably want to support almost all of these.

terracoda commented 8 years ago

@jessegreenberg Both the ESC key and the Close dialog button should be available to close the dialog. The ESC key is currently not working for me on Mac with Safari.

jessegreenberg commented 8 years ago

Good catch @terracoda. That is related to changes I made in https://github.com/phetsims/balloons-and-static-electricity/issues/141, so I am going to move this comment and track the fix there.

jessegreenberg commented 8 years ago

With the commit above, Windows Chrome should no longer skip every other utterance, and Safari should no longer stutter at the beginning of the utterance.

jessegreenberg commented 8 years ago

Above commit includes support for toggle buttons, role=button, and aria-grabbed.

jessegreenberg commented 8 years ago

I am noticing that if an utterance is interrupted in Safari, all following messages behave as if they are assertive.

jessegreenberg commented 8 years ago

For some reason, Safari will fire start and end events immediately for all utterances in the queue if they are added to the utterance queue after the synth.cancel() is called. The last utterance is read. The last utterance in the queue is read and returns to normal after when this last utterance finishes being spoken. This is why all utterances seem 'assertive'.

In addition, if an utterance is interrupted with cancel(), 'end' event never fires.

jessegreenberg commented 8 years ago

Many properties in Safari stop working once cancel() is called for assertive updates. synth.speaking, synth.pending, 'start' and 'end' events all stop firing once synth.cancel() is called if utterances are added immediately after cancel().

terracoda commented 8 years ago

@samreid @jessegreenberg I like the following idea discussed above:

So if we had a system that queued the last message of each type, that would not be an issue.

jessegreenberg commented 7 years ago

This issue can now be closed. Associated code can be found here: https://github.com/phetsims/scenery/tree/master/js/accessibility/reader

The prototype screen reader is now in a place where we can introduce support for additional ARIA roles and attributes as necessary. There is no intention of spending time on this at the moment.

We have gotten better at providing content in the PDOM that is easy for screen readers to navigate, so creating our own reader is not as desirable. It can still be used for demonstrations or research purposes in the future.

phetsims / scenery

Create our own Screen Reader #538