Open cookiecrook opened 4 years ago
One potential way to resolve this is to resolve #4868 (break the speech media type into several media features) and declare that the "aural box model" properties, including pause-before
/pause-after
only apply to the linear-audio media feature.
Even in the case of linear rendition, it would make sense to skip pause-before
when reading starts at that particular element. The idea of linear reading needs not be exclusive with the idea of following links / using bookmarks to get to the point where the reading starts, and starting with a (potentiality long) blank isn't great.
So I agree with you that the text original proposed (or something to the same effect) need to be added to the spec, (normatively, not as a note), and I don't think it should be limited to the non-linear type of reading that screen readers do.
The original PWFG feedback on CSS 3 Speech from 2011 included this comment about
pause-before
But the CSS WG rejected that comment from the W3C's cross-functional accessibility review group, listing a bulk acceptance (by @michael-n-cooper) of the rejections. However, as I read the resolution, it appears that the acceptance was to reject removing the properties, but add the following guidance, among other notes.
But those notes were never added prior to publishing CSS 3 Speech.
That appears to have been an oversight or miscommunication, so I'm re-raising this as a blocking issue for the republish of CSS 3 Speech to CSS Speech 1, with the additional context below.
pause-before
should not apply at all in certain circumstances, depending on how the user got to the element. For example, if a screen reader user performs the keypress for “next heading”, they should hear the speech immediately without delay. Trimming leading silence is somewhat analogous to trimming leading whitespace.Some screen reader users notice and start to be annoyed if a time-to-utterance delay (leading silence) is greater than 40ms. Most daily screen reader users would notice the delay at about 80–100ms. So allowing page authors to specify delays of several seconds does not make sense in the context where the screen reader user or speak-on-hover user is actively navigating.
There are some circumstances where gaps between concatenated utterances in a single rendering (e.g. pauses between phrases in an ebook or “read all” context), but because the spec is focused on linear generated audio rather than speech usage in general, it doesn’t adequate represent the contexts where features like pause-before should not apply.