Description of bug or feature request

I work in the context of accessible data representations and an issue that exists at a fundamental level in this line of work is the coordination of synthesized speech (from a screen reader) and data that is represented with sound/tones (such as sonification). I was wondering if there could be an amendment to ARIA so that screen readers could better parse authorial intent and allow designers to express more possible experiences.

The problem is that, from a design standpoint, having a navigable soundscape requires one of the following non-optimal directions:

Sonification and synthesized speech play at the same time, once an element receives focus (causes cacophony of sound)
After focus, sonification code guesses when synthesized speech ends and then tries to play after speech (pretty hard to guess, ends up being bad experience, typically)
Sonification exists on a separate node/element than the description AKA tone -> text -> tone -> text (doubles navigation presses and may create confusion or require more labeling to understand what is paired to what)
Sonification only ever plays when a key or button is pressed (slows down and adds complexity to the sonification experience, plus still requires waiting until speech audio finishes)

I understand that the second option above, while awesome if it works, would require browsers to have some knowledge of screen reader settings (which violates privacy). So likely that will not be possible (unless someone is creative?).

But the real tension is that in a data experience, sonification provides much, much faster comprehension than language audio. It is reasonable to assume that designers of analytical systems will want to allow users to set preferences or opt in to a tone-first experience, so that they can analyze their data as fast as possible (advanced users may just navigate/scan for the tones and not want to wait for speech at all). However, this would mean that speech audio should actually wait for the tone (rather than any of the previously mentioned options). Currently this is impossible.

The only option forward is allowing users to set preferences or opt into a tone-only experience (no text/description labels at all) and then provide that information either on-demand or in an aria-live element that updates after the sonification ends.

Proposal

I want to propose an aria attribute aria-delay to be used like aria-delay=250. For this, milliseconds can be specified so that screen readers will know to wait before playing (aria-delay=250 would cause a 250-millisecond delay before speech begins). This would allow sonification designers to have tone-first options in data navigation experiences.

Limitations and issues

Perhaps to avoid possible abuse or poor design (such as very long wait times), the delay attribute could only receive values up to a maximum allowed value (such as 2000) and would only work in environments where scripts are enabled.

I also recognize that there is no way for the designer to know whether a screen reader actually recognizes this setting, so it might only be advisable to append an aria-delay attribute if the user explicitly opts in (via preferences or something in an app or on a page).

More context

(This may or may not intersect with the discussion over at #991.)

I've been really interested in navigation experiences for data representations (see my recent work Data Navigator). Unfortunately, it became quite clear that navigable chart elements cannot easily have sonification added to them without creating serious design tensions for users. I left out all sonification demos from my initial project release for this reason.

Fortunately, I am not the only researcher/innovator who is running into these issues when trying to make accessible data experiences:

There is a beautiful and robust data sonification grammar by @see-mike-out called Erie, (github link) that I know has run into this tension as well. (See section 8.1 in the Erie paper.)
There is a recent accessible visualization and sonification project by @jrthompson33 and folks at Microsoft called Chart Reader, also on github that touches on this issue as well. (See section 6.2 in the Chart Reader paper.)

These projects intend to not just attain some kind of standard for accessibility but really push the limits of quality data representation experiences. Unfortunately, screen reader audio is still quite difficult to design around! I'd like to imagine some kind of path forward that allows authors of sonifications to express their intended audio experiences to AT. Hence, why I am here proposing something to y'all in the ARIA wg. Looking forward to the discussion!

Will this require a change to CORE-AAM?

Unsure.

Will this require a change to the ARIA authoring guide?

Possibly.

w3c / aria