w3c / csswg-drafts

CSS Working Group Editor Drafts
https://drafts.csswg.org/
Other
4.5k stars 661 forks source link

[css-speech-1] pause-before should not apply in most contexts #4870

Open cookiecrook opened 4 years ago

cookiecrook commented 4 years ago

The original PWFG feedback on CSS 3 Speech from 2011 included this comment about pause-before

We are also concerned that end users will interpret correct implementation of these properties as a severe performance lag. For example, if a user were forced to wait 2 seconds between each heading, the experience would be tedious for TTS users comfortable with machine speech at rates pushing 400 words per minute.

But the CSS WG rejected that comment from the W3C's cross-functional accessibility review group, listing a bulk acceptance (by @michael-n-cooper) of the rejections. However, as I read the resolution, it appears that the acceptance was to reject removing the properties, but add the following guidance, among other notes.

If you plan to keep this property, we suggest the following: [...snip...] Unequivocally declare that implementors should ignore pause-before values when navigating to an element in the screen reader context, so as to not create the perception of performance lag. e.g., If a screen reader user presses the command to "jump to next heading," speak it immediately. Ignore pause-before immediately after a focus change.

But those notes were never added prior to publishing CSS 3 Speech.

That appears to have been an oversight or miscommunication, so I'm re-raising this as a blocking issue for the republish of CSS 3 Speech to CSS Speech 1, with the additional context below.

pause-before should not apply at all in certain circumstances, depending on how the user got to the element. For example, if a screen reader user performs the keypress for “next heading”, they should hear the speech immediately without delay. Trimming leading silence is somewhat analogous to trimming leading whitespace.

Some screen reader users notice and start to be annoyed if a time-to-utterance delay (leading silence) is greater than 40ms. Most daily screen reader users would notice the delay at about 80–100ms. So allowing page authors to specify delays of several seconds does not make sense in the context where the screen reader user or speak-on-hover user is actively navigating.

There are some circumstances where gaps between concatenated utterances in a single rendering (e.g. pauses between phrases in an ebook or “read all” context), but because the spec is focused on linear generated audio rather than speech usage in general, it doesn’t adequate represent the contexts where features like pause-before should not apply.

cookiecrook commented 4 years ago

One potential way to resolve this is to resolve #4868 (break the speech media type into several media features) and declare that the "aural box model" properties, including pause-before/pause-after only apply to the linear-audio media feature.

frivoal commented 4 years ago

Even in the case of linear rendition, it would make sense to skip pause-before when reading starts at that particular element. The idea of linear reading needs not be exclusive with the idea of following links / using bookmarks to get to the point where the reading starts, and starting with a (potentiality long) blank isn't great.

So I agree with you that the text original proposed (or something to the same effect) need to be added to the spec, (normatively, not as a note), and I don't think it should be limited to the non-linear type of reading that screen readers do.