nvaccess / nvda

NVDA, the free and open source Screen Reader for Microsoft Windows
https://www.nvaccess.org/
Other
2.1k stars 634 forks source link

Support for W3C's CSS Speech Module #4242

Closed nvaccessAuto closed 9 months ago

nvaccessAuto commented 10 years ago

Reported by mgifford on 2014-07-02 00:20 I'm trying to see if there is a way to improve the accessibility for http://kushagragour.in/lab/hint/

Which is now part of Drupal 8.

I'd like to see that there is support for http://www.w3.org/TR/css3-speech/

So that we could either insert a pause or change the voice family befor the tooltip is used.

Right now in VoiceOver it is all read together. In ChromeVox it gets ignored. However, there should be some means to convey that the tooltip is distinct aurally from the text it is describing.

This is probably a lot bigger than NVDA. Does NVDA support the CSS Speech Module?

nvaccessAuto commented 10 years ago

Comment 2 by jteh on 2014-07-02 06:18 To directly answer your question, no, the CSS speech module is not supported. This would need significant work in all existing browsers and screen readers and may even require additions to current accessibility APIs. This is not likely to happen any time soon.

Whether we should even do this is somewhat controversial. A screen reader is a bit different to an interface designed specifically for speech. The intention is to represent all functionality available to a "screen" user, even if, in doing so, the speech might not be as "friendly" as one might expect from a specialised speech interface. Being able to tell a screen reader how numbers should be read or a name should be pronounced might be ideal, though even here, we would hit problems mapping this back to screen position, for example. However, we wouldn't want the content to be made entirely different.

As to this specific case, generally, secondary content such as a tooltip is exposed separately from the primary content; e.g. as the "description" of the accessible element. For example, if you use the @title attribute on a link, the link content will be the link's name and the title will be its description. This way, the two types of content are separated and the screen reader can choose how to handle them. This can be done with ARIA attributes; e.g. aria-labelledby and aria-describedby. I feel this would be the more appropriate way to go here; i.e. expose them separately so that the AT decides how to handle them, rather than the library choosing a specific speech experience. The experience chosen by the library might be completely different from how a given screen reader normally reports tooltips.

I'm leaving this open because it certainly needs further discussion, but it's very low priority at this stage.

nvaccessAuto commented 10 years ago

Comment 3 by mgifford on 2014-07-02 13:09 Very interesting! Thanks for taking the time to detail this.

I have asked in FF https://bugzilla.mozilla.org/show_bug.cgi?id=47159 & Chrome https://code.google.com/p/chromium/issues/detail?id=369863&q=css3%20speech&colspec=ID%20Pri%20M%20Iteration%20ReleaseBlock%20Cr%20Status%20Owner%20Summary%20OS%20Modified

But neither is supporting it yet http://css3test.com/

I am sure that any of these elements could be easily abused in a way that makes it less accessible.

speak-as, pause, rest, cue all seem like they could be quite useful if done properly. But as with the title attribute, it's so easy to get it wrong. I've felt that it would be nice to use the voice-family consistently with say an admin theme or perhaps administration functions provided by the CMS. If there was support for this, it might provide the same aural cues that we have visually. Are there places where the pros/cons for this have been publically debated?

But yes, on the specific issue of tooltips, my sense is that the @title attribute has been badly abused and confused with alt text in general. My assumption has been that most screen reader users simply ignore the title as it usually isn't useful.

I don't know that there is a "normal" for tooltips. I'm assuming that these are still great examples Open Ajax Alliance & Dojo nightly http://www.w3.org/WAI/PF/aria-practices/#tooltip

I'm assuming NVDA supports the role="tooltip" and it does really feel like a describedby type of event.

Hopefully we can keep this conversation going a bit more.

nvaccessAuto commented 10 years ago

Comment 4 by jteh (in reply to comment 3) on 2014-07-03 22:50 Replying to mgifford:

I've felt that it would be nice to use the voice-family consistently with say an admin theme or perhaps administration functions provided by the CMS. If there was support for this, it might provide the same aural cues that we have visually.

It's certainly a tricky issue. On the surface, it does seem to make sense that if you can style something visually, you should be able to style it aurally. However, a visual user doesn't require an intermediary tool to present information to them in a primarily linear fashion, so it is a more direct mapping. One problem is that a screen reader might use certain voices for specific purposes, so if something else uses these, it might be very confusing.

Are there places where the pros/cons for this have been publically debated?

Not that I know of.

But yes, on the specific issue of tooltips, my sense is that the @title attribute has been badly abused and confused with alt text in general. My assumption has been that most screen reader users simply ignore the title as it usually isn't useful.

That's not really my experience, especially on form fields and links.

I'm assuming NVDA supports the role="tooltip" and it does really feel like a describedby type of event.

Actually, NVDA doesn't really care about the tooltip role here. The key point is that aria-describedby references the tooltip, so the tooltip content becomes the "description" of the element in question. An NVDA user can then query this on demand and it is also reported when the element is focused, just as a sighted user would generally have to mouse over the element (or interact with it in some other way).

bhavyashah commented 7 years ago

@jcsteh's https://github.com/nvaccess/nvda/issues/4242#issuecomment-155321636 provides a series of seemingly compelling arguments about why this issue is extremely difficult to resolve, why it might be controversial to implement in the ffirst place, etc. Keeping that in mind, I would like to kindly invite developers to further the discussion of this support request for a module I don't believe too many NVDA users desire to work with in the first place, which requires significant code rewrites according to Jamie, and pose several other UX/technical challenges. On the surface at least, wontfix or P4 sounds justified.

sKopheK commented 6 years ago

any chance to support "@media speech" at least? seems to be totally ignored by NVDA :/

brennanyoung commented 6 years ago

I'd also like to keep this discussion warm, and argue against closing the issue just yet.

Certainly, any rationale for not implementing CSS 3 speech support in screen readers is opaque and under-described, plus even though there may be strong arguments against such an implementation, there are also strong arguments in favour. The debate needs a proper and public airing, so that content developers can easily understand the reasoning. I've not found it easy to find relevant discussions on this subject.

The w3c speech API has barely begun to get out there in the wild. I think the wisest course of action is to follow that rollout closely, and see whether it can somehow enrich the experience in NVDA and other screenreaders. If it still seems like a canard at that point, then by all means close.

FWIW, I've already noticed web developers rushing ahead and implementing 'styled speech' in ways that conflict with WCAG recommendations. If I unilaterally get my website to voice its content (using the speech api or just extensive use of pre-recorded html5 audio), how will screenreaders handle the collision? It might be a rare thing today, but I expect it will be more common in the future as developers attempt to be WCAG compliant. At the very least, this particular issue should not be ignored.

Back to CSS 3 speech: There is (I think) a compelling argument for mapping different semantics onto different 'kinds' of speech. There seems to be a use case for (say) aria-live regions to be distinguished from control labels, and each of those distinguished again from static text content. (etc.) More fine-grained or content-specific semantic differences are easy to imagine.

When I say 'distinguish', I mean that it could be spoken in a different kind of voice (perhaps something as subtle as using the azimuth setting, or as radical as a different gender).

One way this might be done could be to link particular aria roles to particular voice settings using css 3 speech properties. Another way might be to offer options to make such mappings in the screenreader preferences, though they are already very complex.

I'd like to invite anyone interested to read this article, which breaks down audio into four 'typologies' (essentially, semantic categories). These categories might not be the best fit for general web content, but they could help to form a 'mental model' for how different audio characteristics could be used to denote different semantics.

dd8 commented 6 years ago

any chance to support "@media speech" at least? seems to be totally ignored by NVDA :/

@sKopheK the Media Queries 4 spec makes it explicit that screen readers should match the 'screen' media type (and not 'speech') because they read the screen https://drafts.csswg.org/mediaqueries-4/#media-types

All the screen readers we tested (VoiceOver, JAWS, NVDA, WindowEyes, System Access and Dolphin) match @media screen and @media all, but not @media speech or @media aural https://www.powermapper.com/tests/screen-readers/content/media-query-speech/

sKopheK commented 6 years ago

Thanks for explanation. We've faced this issue when trying to avoid screen readers to read icons rendered using web fonts (using content attribute in CSS). Using aria-hidden attribute at a separate tag for icon would help, but it's too much unnecessary HTML that has usually only visual meaning.

dd8 commented 6 years ago

@sKopheK There is way in to provide a content: alternative in CSS - but I don't know how well supported it is https://www.w3.org/TR/css-content-3/#accessibility

brennanyoung commented 6 years ago

Our live region updates every couple of seconds, and our product is all about training rapid responses (for first aid). Urgency is an intentional part of the experience, but confusing the UI labels with the fictional accident is not.

We just did some user tests, and can confirm that in our web-app, users find the babble of aria-live spoken in a contiguous stream alongside UI accessible names, announced in the exact same voice cripples usability. This was with aria-live="polite", by the way, which is supposed to be the least pushy beyond pure silence. I hoped for gaps, at least.

We may have to abandon aria-live altogether, and roll-our-own 'live region', just to get a different voice.

We really need to be able to distinguish semantics with different voice settings. Whether this be with CSS, distinguishing different aria-live 'channels', or some other mechanism.

By all means, let it be up to the user what the details of those voice choices are, in much the same way as the user can choose font-family settings for 'serif', 'sans-serif', 'monospace' or 'fantasy' in the browser preferences.

brennanyoung commented 6 years ago

Just found this which states

Ideas for Settings and Heuristics Allow for a different voice (in text-to-speech) or other varying presentational characteristics to set live changes apart.

Adriani90 commented 5 years ago

@derekriemer, @jcsteh, @michaelDCurran, @feerrenrut your thoughts are very apreciated.

josephsl commented 5 years ago

Also @MarcoZehe and anyone from Microsoft as well.

oferb commented 4 years ago

Yet another use-case:

Being able to create something like Emacspeak for code, where semantic meaning of words is translated to a different pitch (e.g variable names sound different than class names). This is similar to syntax highlighting for sighted developers.

https://en.m.wikipedia.org/wiki/Emacspeak

I personally think, if it's speaking hints specifically for screen readers, it's there to help screen reader users, in good intentions and probably not as an afterthought. Why not give the option to developers to provide richer experiences?

oferb commented 4 years ago

Could this kind of support be implemented as an NVDA add-on? For example, having NVDA read the following while emphasizing "lazy": "The quick brown fox jumps over the lazy dog"

Emphasis could be done using different pitch, volume, delay etc. This would be similar to how people would actually say the sentence when reading it out loud.

feerrenrut commented 4 years ago

Just reading through this now, given I'm not familiar with the background of this hopefully I haven't totally misunderstood the point, apologies if so.

The reasoning given on this issue in support seems mostly to allow web developers to have control over how differing semantics are presented to the user. I argue this is the wrong place to map the presentation of semantics. The likely outcome would be different websites providing conflicting or at least inconsistent presentations of semantics. This will only be more confusing for the user. It also ensures inconsistency with desktop applications. I strongly think this mapping should be done by the screen reader. Ideally it configurable by the user to account for any specific preferences or needs they may have. The experiment with aria-live is an interesting one, and likely something we could resolve within NVDA.

I can imagine use-cases for entertainment type applications, eg ebooks, games, or similar. However, to reduce cognitive load, and meet the preferences and needs of the user, the screen reader should provide a consistent experience for consuming information and interacting with applications (web or otherwise).

oferb commented 4 years ago

Cool, so what do you have in mind for this mapping that is done by the screen reader?

sidnc86 commented 2 years ago

Pardon my in-depth knowhow on CSS3 Speach API as I have started reading about it quite recently. But I think from web developers perspective implementing speak:none; from CSS instead of aria-hidden from HTML could be equivalent of text-heading-level:1; in CSS instead of h1 element in HTML. Looking at CSS It should be noted that semantics has been delivered by HTML to Assistive Technologies. So to expect Assistive Technologies to go and look for semantic information back in CSS, is kind of defying the purpose of seggrigating semantics from design.

derekriemer commented 2 years ago

I think the major difference is that speak:none would presume that braille could still occur, where aria-hidden hides both.

On Wed, Jan 5, 2022 at 11:16 AM Siddhant Chothe @.***> wrote:

Pardon my in-depth knowhow on CSS3 Speach API as I have started reading about it quite recently. But I think from web developers perspective implementing speak:none; from CSS instead of aria-hidden from HTML could be equivalent of text-heading-level:1; in CSS instead of h1 element in HTML. Looking at CSS It should be noted that semantics has been delivered by HTML to Assistive Technologies. So to expect Assistive Technologies to go and look for semantic information back in CSS, is kind of defying the purpose of seggrigating semantics from design.

— Reply to this email directly, view it on GitHub https://github.com/nvaccess/nvda/issues/4242#issuecomment-1005965436, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABI2FPPVM7JD3PRIOHZGSILUUSDGJANCNFSM4D2X3PEA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

-- Derek Riemer: Improving the world, one byte at a time. ⠊⠍⠏⠗⠕⠧⠬ ⠮ ⠸⠺⠂ ⠐⠕ ⠃⠽⠞⠑ ⠁⠞ ⠁ ⠐⠞⠲ Software engineer, Drive web

mgifford commented 1 year ago

Adding older thread here https://sourceforge.net/p/nvda/lists/nvda-commits/thread/054.dc1ab85e62b6cf0bbf5f855dadb7a9eb%40nvaccess.org/

But also add https://css-tricks.com/lets-talk-speech-css/ and more timely https://www.meetup.com/css-cafe/events/291837233/

Adriani90 commented 10 months ago

The newest specifications are documented here: https://www.w3.org/TR/css-speech-1/

As far as I understand, this speech module is not really only for screen readers, but especially also for using in industries and in situations where actively controlling a device is not appropriate i.e. while driving a car. Or when using the read aloud feature on websites like news papers etc, where people can click on a button to have an article read out loud. another use case is the feature of reading out loud a pdf in Edge or Adobe reader with their internal speech synthesizer. So with the css speech module the web author can specify how his or her website should be read aloud. This is very useful for people with cognitive disabilities such as Down's Syndrome, for people who need easy language, for people with a bit of visual ability or for everyone who just wants to hear a part of a website being read by a voice in a specific case. Still, I am not sure about the use case for a screen reader user. This is a totally different use case. In NVDA we have the framework implemented in #7599 which might make support for CSS speech module possible but I am not convinced that a web author can really meet the best user experience for a screen reader user. How do we make sure then that deaf-blind people have the same experience in braille? And as far as I understand this css speech module takes action when the user clicks on something, it does not follow the focus and adjusts the voice while you are moving the focus around. Is my understanding correct?

Anyway, if we put the control of the speech in the hands of a web author, this should definitely be an optional setting in the screen reader settings. There are still too many things to be considered and it would probably lead to controversial opinions in the community about the user experience. In the end I guess it would overwelm everyone who creates a website if we set any standards related to the speech behavior while reading a website with a screen reader. This is a much more complex task than the read aloud button on news articles. I am not sure web developers really want to take this burden on themselves. However, I would love to see a website that is being implemented with a CSS speech module use case in mind and which simulates a screen reader navigating the website so we can test in a prototype how it would sound like. On the other hand screen reader developers might take this on a larger scale and implement more global use cases with #7599 in mind. This might make more sense in the end.

Adriani90 commented 9 months ago

Given there is no clear use case for a screenreader documented in this discussion, I am closing this for now. Please contribute with a concrete screen reader related use case and we can reopen. Or open a new discussion with screen reader use case in mind.