Is using the track element to provide audio descriptions sufficient?

DavidMacDonald commented 5 years ago

In SC H96 we allow the track element, but there is some concern in the community that it is not working in browsers. See this twitter thread https://twitter.com/pauljadam/status/1090273832999493633 Suggest we demote it to advisory until it is supported.

johnfoliot commented 5 years ago

Not to stir the pot (too much) David, but...

If we've had this Advisory Technique since prior to WCAG 2.1, and have told authors that if they used this technique they would be in Technical compliance, yet it has little support https://caniuse.com/#feat=audiotracks ...

Layer this over the other discussion about using Microdata for SC 1.3.5, and I see a lot of parallels here.

I'm not Advocating for either of these techniques, but should an entity use either, can we really say they are non-conformant to the normative text of WCAG 2.1? (I'll also note that this SC is also one that is not dependent on AT, right?)

NOTE: I do agree that this is a topic for further discussion as it relates to Silver, but given that we have WCAG 2.1 today, and that the mass migration of the web to the next gen requirements is going to take longer than it will take to just publish Silver, I truly think we need to be clear on this concept in relationship to WCAG 2.0 and 2.1, and NOT wait for Silver to answer this question.

Thoughts?

JF

On Tue, Jan 29, 2019 at 3:28 PM David MacDonald notifications@github.com wrote:

In SC H96 https://www.w3.org/WAI/WCAG21/Techniques/html/H96 we allow thentrack element. but there is some concern in the community that it is not working in browsers. See this twitter thread https://twitter.com/pauljadam/status/1090273832999493633 Suggest we demote it to advisory until it is supported.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/w3c/wcag/issues/599, or mute the thread https://github.com/notifications/unsubscribe-auth/ABK-c5hD8pFWC9OHfM8S841HhZOIQLXdks5vIL0BgaJpZM4aY-Kf .

-- John Foliot | Principal Accessibility Strategist | W3C AC Representative Deque Systems - Accessibility for Good deque.com

mraccess77 commented 5 years ago

@johnfoliot wrote I'm not Advocating for either of these techniques, but should an entity use either, can we really say they are non-conformant to the normative text of WCAG 2.1? (I'll also note that this SC is also one that is not dependent on AT, right?)

The SC can be met -- but if there is not accessibility support then you can't meet WCAG conformance requirement 4 only accessibility supported ways of using technology. This has always been the case and in the past we have pulled sufficient techniques that don't meet this years after they were created. One that comes to mind is the use of headings to support SC 2.4.4.

OwenEdwards commented 5 years ago

Folks, I just posted on Paul's twitter thread; the video.js framework polyfills support for HTML <track kind="descriptions" ..> within a <video> element. It's not perfect, and the discussion ought to be whether it is good enough, but I specifically created that feature in video.js to demonstrate how text track audio descriptions (aka "TVD") would work because no browsers support it.

On the other hand, https://caniuse.com/#feat=audiotracks is a separate API/feature related to distinct audio tracks, not text tracks. There's a demo of video.js with a separate audio description audio track at https://videojs.com/advanced/#elephantsdream, which uses the HLS format for the video/audio/text tracks. (Note that that demo doesn't work on IE - it is possible to make it work on IE11, but that page isn't set up to work with IE).

Clearly the separate audio track is a better solution to audio description; the question is, as I mentioned, whether TVD is good enough. But it certainly is supported by video.js. If you try it out and have any feedback, I encourage you to file an issue at https://github.com/videojs/video.js/issues.

awkawk commented 5 years ago

David's comment suggested that we make this technique advisory, but it is already (as John noted, but that may have been obscured by other points). Is there any reason to change that status?

Perhaps we can point to the videojs example as a resource?

awkawk commented 5 years ago

@DavidMacDonald?

DavidMacDonald commented 5 years ago

@awkawk Thinking through this, I propose we retire the technique until there is some reliable value in it. An advisory technique historically has been something that "helps" accessibility. In this case, it doesn't appear to help anything. Advisory techniques are also a way for devs to go beyond the minimum requirements and add some extra benefit for users. This is not the case with this technique. It feels a bit like "longdesc" right now.

OwenEdwards commented 5 years ago

Thinking about this some more, I'm curious how this became a "Technique"? I know Google had an example of it at one point, using a Chrome Extension (perhaps @silviapfeiffer worked on that?), and maybe that was while the WebVTT format was evolving, so it was reasonably possible that browsers might implement support for kind="descriptions". But even with the technical support (in browsers, through an Extension, or using something like Video.js to polyfill support for it), it's not clear that Text Video Description (TVD) is actually viable.

I guess WGBH/NCAM did some work on it too, and the CADET tool allows authoring of Text Descriptions Tracks. But I'm not clear how CADET expects that TVD to be rendered for a user.

gkatsev commented 5 years ago

At FOMS 2018, we discussed the kind="descriptions" and came up with an idea for a polyfill for it using the speech synthesis API. The idea is to use speech synthesis to read out the text track and then, if the end of the cue comes but we are still speaking we can pause the video until the speech is complete and then resume when the end event comes in. Or if we finish early, then we don't need to pause the video. Unfortunately, no had time to make a proof of concept for this yet.

silviapfeiffer commented 5 years ago

We had that working more than 6 years ago, see https://gingertech.net/2013/04/22/summary-video-accessibility-talk/ (not sure if it still does). We used a browser extension to do the speech synthesis and the pausing. It worked really well and the blind people that tested it liked it. It needs somebody leading the charge to make it happen.

awkawk commented 5 years ago

On February 12, 2019 the WG decided to retain the technique, but added a note to clarify the current implementation status: c51995c

lauracarlson commented 5 years ago

@awkawk , @alastc , @johnfoliot would it make sense to add the note to a “User Agent Support Notes” doc?

At one time there was a "User Agent and Assistive Technology Support Notes" heading in technique documents. Check H46 for example. That had a link to a User Agent and Assistive Technology Support Notes document.

2.1 doesn't seem to have the UA headings section. https://www.w3.org/WAI/WCAG21/Techniques/html/H46

Maybe it was too much of a maintenance issue to keep it up to date?

awkawk commented 5 years ago

@lauracarlson I think that this is more of a general note, so it makes sense this way. Filing an issue for the UA notes separately.

OwenEdwards commented 5 years ago

@awkawk - I strongly disagree with the wording of the note in c51995c3d88c0ee50cd3d4365a973358893f1999. There is support for text track audio description, via the video.js framework (I've finally been able to get an example publically available at https://videojs.com/advanced/#elephantsdreammp4). There is no native UA support for this technique, and even the video.js implementation has some limitations/issues, but to say "there is no practical support for this technique available for developers to utilize" is simply not true.

Something like "there is no native UA support for this technique, and the technique has some unique limitations/issues which need further work" would be much more appropriate.

awkawk commented 5 years ago

@OwenEdwards I haven't tried this - do screen readers voice the text of the descriptions when they are displayed?

OwenEdwards commented 5 years ago

@awkawk yes. That's how H96 read (as well as other W3C documents that mentioned "TVD" - text video description) but other than the Google self-voicing demo that @silviapfeiffer mentioned, no one ever built it out, so it was purely theoretical. Video.js actually implements it, so now the discussion can move past "is it feasible" to is it "good enough"? And what is good enough for Audio Description?

awkawk commented 5 years ago

I just tried it on a Mac with VoiceOver and it is super cool. Great work.

Others? What do you think?

awkawk commented 5 years ago

Reviewing the twitter thread it seems that the potential problem of this technique is that a screen reader user might slow down the rate of speech and then the screen reader would be speaking over the program audio for some amount of time. For example, if there is a 2 second gap in the program audio to allow a short description, when it is recorded audio the author knows that a description like "a man enters the room carrying a box" will fit in the two second space before someone in the video starts speaking, but if the user has a screen reader set to speak slowly the synthesized speech may not have finished.

To me this calls for authors to employ a cushion in time to make sure that allow for some variability in speech rates, but I don't want to discourage this approach because audio description is still rare online, probably in no small part due to the production costs involved with recording audio.

OwenEdwards commented 5 years ago

@awkawk exactly - what you’re talking about is part of what I’m referring to as “qualitative” (good enough) issues with Text Video Description (TVD). We thought about a lot of this (and more) when I worked at the Video Description Research and Development Center(VDRDC), but it wasn’t the main thrust of our grant-funded work, so we didn’t get around to pursuing or publishing conclusions. I’ve continued to work on Video Description since leaving the VDRDC (including Text Description, recorded Audio Description, Extended Description, etc.), and one of the problems with gathering more information about issues and solutions was that there weren’t any major platforms which supported Text Video Description (so “quantitatively” there was no way to get to a point of saying whether it was “qualitatively” a viable solution - all the previous work was very much speculative). It was a little like saying that having an alt attribute would be a good enough solution to making images accessible for screen reader users but never implementing it (so maybe more like “longdesc”!!) How can anyone say what is and isn’t good enough alt text if the mechanism itself hasn’t been implemented?

So there needs to be a much broader discussion about the needs, issues and solutions around TVD rather than each individual noting something that occurs to them about it. I have a lot of information, including from tracking things like the Twitter thread you mention, over the past several years, but I’m by no means the only person with knowledge. We need to consider how we pull together this knowledge, who will do what research and experiments to try solutions and gather best practices, etc. That may be beyond W3C/ARIA’s scope, but I object to W3C/ARIA proposing it as a viable solution without further work, and then pulling the plug on it with a single sentence editorial change with doing any more research or at least studying what work is currently being done.

alastc commented 5 years ago

The change was accepted, the note updated via #624

w3c / wcag

Is using the track element to provide audio descriptions sufficient? #599