Closed patrickhlauke closed 4 years ago
FWIW https://twitter.com/patrick_h_lauke/status/900325928076816384 coming up dry so far...
Thanks for bringing this to my attention.
The current definition of speech
is lousy, as screen readers, which it claims should match it, are a better fit for screen
(plus some extra capabilities), since they do work on the result of a visual layout. If that's really what it's for, it should be deprecated, because it is wrong.
If we want to use it for a set of user agents completely disjoint from visual UAs (e.g. “Siri/Alexa, read me the wikipedia page about strawman arguments”), then it would make some more sense as a media type. Which still leaves us with a question of whether anyone implements that.
Either way, it should not stay the way it is.
@tabatkins Any thoughts? Who do we get in touch with to know if they are interested in this for pure-speech UAs (not screen readers)?
The current definition of speech is lousy, as screen readers, which it claims should match it, are a better fit for screen (plus some extra capabilities), since they do work on the result of a visual layout. If that's really what it's for, it should be deprecated, because it is wrong.
Yup, and also noting that screen readers are also used not just by 100% blind users, but those with partial sight as well, and even users with good eyesight but cognitive issues. Plus of course the fact that SRs sit on top of browsers, and the browsers clearly behave as screen
user agents (although they may be able to detect there's AT running, they don't switch to now pretending to be speech
devices, which would be wrong anyway per the previous bit about SR use).
I only kept it because it was supposedly what things that used CSS Speech should identify as. If there's no actual use-cases (or the use-cases are all merely theoretical, and all UAs in reality just report "screen" and assistive agents are ok with this), then I'm totally fine with dropping it and just going with the screen/print dichotomy.
Right, but CSS speech isn't just meant for screen readers (although they can use it was well), but also for UAs that do a pure audio rendition of the document, without any reference to a 2D visual layout. For these, the speech
media type isn't crazy, except I don't know if this is just a theoretical use case, or if there are audio UAs that implement it, or audio UAs that would implement it but just haven't got around doing it yet.
My googlefu isn't very strong today, but I am pretty sure that there are EPUB UAs that can read the book aloud to you, instead of displaying it. I presume they actually work on marked up text and could support css-speech and this MQ, but I don't actually know and maybe they just work on plain text. We should check with these if they already support this or plan to, or have any kind of feedback about this.
At the very least, can the spec be updated to not claim "Matches screenreaders" ? Because we do know that that part is not true (in, as far as I'm aware, all current screenreader scenarios).
Sure, we'll either do that or drop the thing entirely. Either way, we need a WG resolution to make this normative change, so I'd rather get enough info ahead of time and do it right in one go. But if we can't find anything, that tells us what to do.
Marked as Needs Edits, because we can at least do the "stop claiming this is for screenreaders" part.
@frivoal, do you think you have enough data to bring up the rest of the issue to the WG?
It seems to me that what we’d be looking for is something that does parsing of Style sheets, like a browser does, but then builds its own layout tree (speech tree) after applying the rules. foo { display:none }
, for instance, would remove foo from the speech tree, whether from the speech
media type or the all
or screen
media type. But :focus foo { display: block }
would put it back in. If display
was used this way inside a MQ, you wouldn’t need a separate speak
property.
Someone should just write a plug-in to do that.
Is there any more information needed to bring this to the WG for a decision?
I've done the edits discussed above (“speech
is for pure-audio UAs, not screen readers”). As for deprecating it altogether, someone needs to follow up the investigation started / suggested by the tweet from https://github.com/w3c/csswg-drafts/issues/1751#issuecomment-332539713, and then come back to the WG.
I may get to it eventually, but if anybody wants to dive in and report back, that'd be very much appreciated.
To add to the above, I think the speech
media type is an important one to keep.
I suspect pure-audio UAs will soon evolves from the various virtual voice assistants (Apple Siri, Google, Amazon Alexa, ..). They have yet to allow browsing an actual HTML web page, but it's not a big leap from the current web-based actions. It'd be nice to have the handling in place to arrange things differently as-needed, once this evolves.
Results for @media speech CSS query and media=speech on link element
https://www.powermapper.com/tests/screen-readers/content/media-query-speech/ https://www.powermapper.com/tests/screen-readers/content/media-link-speech/
TL;DR; not implemented in any common screen reader / browser combination
Results from the Speech section of http://css3test.com/ for common browsers:
Chrome 66: CSS speech support 0% Safari 11: CSS speech support 0% Firefox 60: CSS speech support 0% Edge 17: CSS speech support 0% Internet Explorer 11: CSS speech support 0% Opera 53: CSS speech support 0%
... and bug trackers for different browser and AT vendors:
NVDA screen reader: not supported https://github.com/nvaccess/nvda/issues/4242
Mozilla: not supported https://bugzilla.mozilla.org/show_bug.cgi?id=1339987
Chrome: not supported https://bugs.chromium.org/p/chromium/issues/detail?id=369863 https://groups.google.com/forum/#!topic/axs-chrome-discuss/b6qcNhlealg
Opera: experimental (prefixed) support up to Opera 12.10 - removed in Opera 15 https://stackoverflow.com/a/13515280
@patrickhlauke can you advise if this issue is still relevant so APA should track, or if it's addressed somehow somewhere?
going through the same resources again that @dd8 listed (but noting that some of those were about CSS Speech Module http://www.w3.org/TR/css3-speech/ which is something different from the speech media type discussed here)
https://www.powermapper.com/tests/screen-readers/content/media-query-speech/ https://www.powermapper.com/tests/screen-readers/content/media-link-speech/
there is no change. no AT/browser combination seems to support it.
i don't have access to any pure audio devices (alexa or similar) to carry out any more manual tests, but conversely i've not heard of anything in this area being done/developed (i would have thought that if any of these devices offered a special custom support for speech media type, there would have been some specific developer relations type documentation floated around to get authors to start implementing things specifically for alexa and co...but I have not seen anything of the sort).
as such, i'd like to raise this again @frivoal ... can this be officially put to bed and removed? i see zero appetite from UA developers (which would need to be the ones implementing things for AT to then consume) on this...and keeping it around perpetuates the idea/hope that this media type will one day be implemented for real.
(to be more accurate about CSS speech module, while it does talk about aural
and speech
media types, the properties/spec itself seem standalone - to me at least; nothing would prevent user agents implementing any of the proposed/spec'd properties there, just in a regular screen
media type context. it may be good to perhaps try and contract the CSS Speech Module group, in case they have more tangible information about plans for support, and/or to check if they'd be ok to reframe their non-normative background info to not anchor their properties to the speech
media type which appears not to actually be implemented anywhere)
Overall, I don't think there is strong support for having the speech
media type remain, but I would point out that the results outlined in https://github.com/w3c/csswg-drafts/issues/1751#issuecomment-390288685, while interesting, don't really help much with this: all the UAs described in that comment would be expected to be classified as screen
, and therefore not at speech
, since the two are mutually exclusive. This is mentioned in the spec. So finding that UAs which aren't expected to support speech
indeed don't support speech
doesn't tell us much about whether it is being used elsewhere as intended.
What would be needed to inform this decision is a survey of User Agents that only produce speech, without any visual representation.
Even if we don't find enough information about such UAs, there certainly doesn't seem to be a lot of enthusiasm for this media type, so I wouldn't feel particularly bad about deprecating it though. But let's not use the wrong data to justify this decision.
ah, good point @frivoal ... wondering then if the CSS Speech Module group has a list of UAs that do pure speech, and if they could help shine a light on this? (I would assume they must have some real-world UAs, if they had at least 2 interoperable implementations to pass the TR requirements?)
any chance somebody officially closer to this spec could ping the group / Daniel Weck @ DAISY?
i know this is a very minor thing in the grand scheme of CSS stuff, but...I see the speech media type occasionally being dragged out in accessibility discussions as a potential solution for things, and i'd like to have a bit more of a definitive answer on its viability.
CSS Speech Module never went pas CR, it doesn't have 2 implementations.
As far as I know, speech only css implementations remain a theoretical possibility, not attested in the wild. They theoretically should match the speech media type, but I've never seen or heard of one.
It is a logical possibility, so the value isn't wrong. But it seems nobody uses it in practice, so it is not useful. If we were to propose introducing it today, we probably wouldn't. But it's there specified already. Should we remove it? Leave it there to indicate what the right thing to do is to a potential future speech-only implementation?
CSS Speech Module never went pas CR, it doesn't have 2 implementations.
oh goodness, you're right. i sometimes can't see the wood for the trees (got dazzled by the /TR/ URL and didn't actually look properly).
It is a logical possibility, so the value isn't wrong. But it seems nobody uses it in practice, so it is not useful. [...] Should we remove it?
purely anecdotally, i find it being mentioned on occasion in accessibility discussions as if it were a real thing - which then always leads to having to explain that it's theoretical, not practical, and that it can't be used/relied on. removing it from the spec to match implementation reality makes most sense in my view, particularly considering how braille
was deprecated (presumably for similar reasons) in the past.
a big part of the issue to me is that curious authors may come to the spec, and see the current wording
speech Matches screenreaders and similar devices that “read out” a page.
and think that that is a statement of fact/current reality. where, as far as we're aware, it demonstrably isn't. that very affirmative statement in the spec is doing potential damage and causing confusion among authors.
Leave it there to indicate what the right thing to do is to a potential future speech-only implementation?
i think the problems there would be the same that we saw with tv
and handheld
... that authors didn't understand that when they defined something as tv
then they can't define "general" things (meant to apply to tv
as well) as screen
, as media types are mutually exclusive (and the reason why, say, Opera had to basically ignore tv
media type as it led to sites looking broken because authors didn't understand that).
so i think a future media feature may be far more beneficial than a meda type for this sort of thing. features offer a far more nuanced approach that doesn't just slot device types into large and ill-defined "buckets".
Speech synthesis with configurable voices would be nice for preschool kids and other illiterates. Many sites (and apps) for young children currently play back a prerecorded audio file for navigational elements on hover. This feels a bit like (and is as inflexible as) prerendered bitmap images for headings, which was commonly seen before the advent of downloadbable web fonts. This is quite different from speech synthesis for blind users who are usually used to a single voice and very fast playback speeds.
Voice-only or voice-first user agents and texts prerecorded by TTS systems are also common in transport and traffic environments, e.g. in car multimedia systems and with earphone devices used to listen to podcasts, audiobooks, music, navigation hints etc.
In other words, css-speech is very useful indeed and definitely deserves more interest by implementors, but the media type speech
could safely be deprecated. However, MQ should introduce aural media features!
see the current wording
speech Matches screenreaders and similar devices that “read out” a page.
Oops. I didn't realize that this fix/clarification had only been made in the editors draft, and that we were due to republishing.
At the very least, I need to get things in order and update the CR.
As to the rest of your point, I think you've got me mostly convinced. It's theoretically right, but practically unused, unclear if it ever will, and causing confusion. We should probably drop it.
We discussed this internally and Apple's opinion is that it is okay to deprecate the legacy media types for speech
, braille
, aural
, etc.
As others have mentioned here, and we've discussed in #4868, media types (as opposed to media features) are mutually exclusive, so these type values cannot be used with screen
media, making them useless for all screen-based assistive technology, including screen readers. Theoretically there may be some utility (like the "smart speaker" or "linearized audio" concepts) but 1) those use cases would be better addressed now as a media features rather than a media types, and 2) we're not aware of any assistive technology implementations of these legacy media types. Deprecation seems like the right option.
I also want to make a distinction that @patrickhlauke raised above: some comments seem to be conflating the speech media type with speech-related properties. I believe that deprecating the speech
media type will have no negative impact on the speech-related properties defined in "CSS 3 Speech" (now known as "CSS Speech 1"). WebKit+VoiceOver on iOS is the only partial implementation I'm aware of, but it applies the speak:
and speak-as:
properties to all media types and in practice, it's only used with the screen
media type.
Thanks.
I think the only impact on CSS Speech will be the need to slightly tweak the informative prose in https://www.w3.org/TR/css-speech-1/#background
/!\ Please note that I won't be able to attend the discussion scheduled about this during the F2F.
The gist of my thoughts on this is that the speech media type is not something that should exist, because there is no reason not to use the speech properties on elements all the time. In the same way we don't have a media query that's (has-a-display)
on which we condition setting display:grid
on some elements. It's self-obvious that a browser that doesn't have a display will not need to honor display:grid
if it is not expected to have an effect for its output.
People who rely on assistive technologies do not want to disclose that, and browsers like Microsoft Edge have a "read-aloud" feature that users can activate on any piece of text on any page, and would benefit from annotation about how to pronounce the text in a nice-to-hear way.
If the discussion results in removing the display type, sounds good, but if the discussion tilts the other way, I would like to ask to re-discuss this at a further date, for instance next Wednesday.
I would want to hear from @LJWatson before resolving on this issue.
@FremyCompany I agree most of what you said. As for "People who rely on assistive technologies do not want to disclose that", that is also true, but it seems a little disconnected from this feature: @media speech { }
is not relevant for screen readers. Screen readers generate speech, but they are not standalone sound-only user agents: they're paired with a @media screen
user agent and do take the visual into account as well. Media types are exclusive, so @media speech
applies to media that only generate sound and don't match @media screen
. screen readers don't match @media speech
, and it would be a spec violation for them to match it.
@media speech
is only for audio-only UAs, for example alexa/siri or linear rendering of audio books. Some software is roughly like this, but I have not run across any that implements speech media type. Other have also looked, with no more success. And as you said, even in the context of audio-only UAs, it's far from clear that using @media speech
would be helpful, as audio related properties can be set by the author unconditionally, and merely get ignored when rendering the page visually.
Nonetheless, it happens repeatedly that people get confused and believe that speech is screenreaders. So we should probably drop it and move it to deprecated media types where syntactically valid but defined to never match.
@LJWatson, the csswg was inclined to agree to deprecate this, but we didn't resolve as we wanted to hear your point of view on this.
Thanks to @Fantasai and frivoal for reminding me this was being discussed by the WG. I agree with the suggestion to deprecate speech as a media type.
As @patrickhlauke notes it is not supported by screen readers and they, as @frivoal notes, do not operate independently of the screen in any case.
For voice UI in the browser the CSS Speech properties have far more promise.
The CSS Working Group just discussed [mediaqueries-4 ] Deprecate 'speech' media type as well?
, and agreed to the following:
RESOLVED: deprecate 'speech' and have it have the same behavior as other deprecated MQs
Possibly too late now for MQ4, but just wondering if there is any known support in the wild for the "speech" media type https://www.w3.org/TR/mediaqueries-4/#media-types
To my knowledge, the answer is no...which makes me wonder if that should also be deprecated along
braille
,aural
and co.