w3c / wcag

Web Content Accessibility Guidelines
https://w3c.github.io/wcag/guidelines/22/
Other
1.09k stars 242 forks source link

Revisiting imbalance between 1.2.4 Captions (Live) (AA) and 1.2.9 Audio-only (Live) (AAA) #795

Open patrickhlauke opened 5 years ago

patrickhlauke commented 5 years ago

This picks up something I already noted about two years ago, but could maybe be discussed in the context of WCAG 2.2/silver ... https://lists.w3.org/Archives/Public/w3c-wai-gl/2017JulSep/0052.html

In short, there's currently a weird imbalance in requiring live audio+video to have captions at AA (SC 1.2.4), while for an audio-only live stream the requirement is only at AAA. The rationale at the time, it seems, was that live audio only was more common (internet radio style) back when WCAG 2.0 was finalised, compared to the relatively more uncommon audio+video live streaming (which at the time was technically onerous, not frequently used, etc). nowadays, the opposite is likely true...there's much more audio+video live content being streamed (think twitch, let's plays, live webinars, and so on) compared to audio only. and technically, it's technically just as onerous to provide live captions for either audio only or audio+video live content.

Taking it to extreme, say an author wants to meet AA and wants to do a live vlog or something. If they just did live audio only, they'd be off the hook with regards to captions. As soon as they turn on their camera as well to show their face while vlogging, all of a sudden they're failing the live captions requirement. This seems...odd.

mraccess77 commented 5 years ago

I think it’s a question of whether a talking head video is really synchronize the media or not

patrickhlauke commented 5 years ago

i believe the answer is yes? https://www.w3.org/WAI/WCAG21/Understanding/captions-live.html has as example "A Web cast: A news organization provides a live, captioned Web cast."

from the definition for synchronized media https://www.w3.org/TR/WCAG21/#dfn-synchronized-media

"audio or video synchronized with another format for presenting information and/or with time-based interactive components"

which again to me (since "video" also links to its own definition that talks purely about the image/visual aspect) suggests that a live video with live audio falls under this SC.

mraccess77 commented 5 years ago

G203: Using a static text alternative to describe a talking head video

patrickhlauke commented 5 years ago

if you're pointing at that to make the point that "video" means "video + audio"... WCAG specifically makes distinction of audio and video (see also "video-only" SCs). yes, in common parlance, when talking about "video", people mean the combination of moving image possibly with audio, but this is not the case for 1.2.4.

Note also that G203 doesn't talk about a LIVE video (with audio), but prerecorded video.

johnfoliot commented 5 years ago

+1 to Patrick's thinking that this should all be addressed in Silver.

While we are at that, I'd also like to ensure that "video" actually be re-named "multi-media" (or expanded to video and other multi-media sources"), which would also include other forms of animation... as Patrick described it "moving pictures with (or without) audio". For while I note the definition of video states:

video

the technology of moving or sequenced pictures or images NOTE Video can be made up of animated or photographic images, or both.

...it has been my observation that many evaluators take "video" to also be synonymous with "mp4" (which is a false assertion). I could argue that animated gifs might be in scope, as well as animated SVG https://en.wikipedia.org/wiki/SVG_animation content.

Additionally, I was in conversation with a developer earlier this week who is working on a standardized 'bot' language https://www.w3.org/community/conv/, and his PoC can be scripted such that the bot can provide 'animated' responses to input ("Avatar asks 'which door', and when you say left it goes to the left door..."). Is that a video? Does it need audio description?

JF

On Thu, Jun 20, 2019 at 12:32 PM Patrick H. Lauke notifications@github.com wrote:

if you're pointing at that to make the point that "video" means "video + audio"... WCAG specifically makes distinction of audio and video (see also "video-only" SCs). yes, in common parlance, when talking about "video", people mean the combination of moving image possibly with audio, but this is not the case for 1.2.4.

Note also that G203 doesn't talk about a LIVE video (with audio), but prerecorded video.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/w3c/wcag/issues/795?email_source=notifications&email_token=AAJL443KOUIGIZU3JBKKALTP3O5KBA5CNFSM4HZOJLP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYGC47Q#issuecomment-504114814, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJL44ZNYFTIQNQ3O5FL2KLP3O5KBANCNFSM4HZOJLPQ .

-- ​John Foliot | Principal Accessibility Strategist | W3C AC Representative Deque Systems - Accessibility for Good deque.com

awkawk commented 5 years ago

Huge -1 on renaming video to "multi-media". We already have terms that cover the non-video case and multimedia has been too ambiguous a term to use.

johnfoliot commented 5 years ago

AWK,

(or expanded to "video and other multi-media sources")?

JF

On Thu, Jun 20, 2019 at 12:52 PM Andrew Kirkpatrick < notifications@github.com> wrote:

Huge -1 on renaming video to "multi-media". We already have terms that cover the non-video case and multimedia has been too ambiguous a term to use.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/w3c/wcag/issues/795?email_source=notifications&email_token=AAJL442D4LUXGZ3MAX2DWGDP3O7XRA5CNFSM4HZOJLP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYGESEY#issuecomment-504121619, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJL445DMA64AUXZFTX7D2DP3O7XRANCNFSM4HZOJLPQ .

-- ​John Foliot | Principal Accessibility Strategist | W3C AC Representative Deque Systems - Accessibility for Good deque.com

patrickhlauke commented 5 years ago

i am for keeping "video" to mean "moving images" without any audio. but would be good to reinforce that it does NOT mean "moving images with an audio track". and being careful in examples etc that we specify explicitly "video with audio" when we mean that (e.g. that G203 and the "video" of a talking head...clearly "video + audio" in that case).

patrickhlauke commented 5 years ago

but in any case, main point is: it shouldn't be that if i want to just have a live audio stream, i'm exempt from having to provide live captions, but as soon as i add even just a static video image feed of me sitting in my studio while doing this live audio stream, i all of a sudden have to provide captions. seems topsy turvy, effort-wise (on the other hand, makes for good advice if a client asks "how can we avoid having to do captions for our live podcast video?" and you can say "just omit the video part, boom...exempt at AA!")

bruce-usab commented 5 years ago

We tried like forever to keep the term "multi-media", so I agree that revisiting the term would be beating a dead horse. Synchronized media is just one syllable more, so I can live with that.

In short, there's currently a weird imbalance in requiring live audio+video to have captions at AA (SC 1.2.4), while for an audio-only live stream the requirement is only at AAA.

I respectfully disagree, as requiring an audio-only stream to have a visual aspect (i.e., captions) is more onerous than requiring a video+audio stream to have certain conditions on its visual aspect (i.e., captions).

it's technically just as onerous to provide live captions for either audio only or audio+video live content

I agree that the main difficulty (expense) is with arranging for the CART, and that this main difficulty is trivial as to the requirement to figure out how to provide a link to the live captions from a nominally audio-only source.

But IMHO this still seems like requiring a fundamental for too many uses cases for it to be anything other than AAA. Some not-quite-the-web-but-soon examples:

As soon as they turn on their camera as well to show their face while vlogging, all of a sudden they're failing the live captions requirement.

I am okay with that!

as soon as i add even just a static video image feed of me sitting in my studio while doing this live audio stream, i all of a sudden have to provide captions

I am not convinced that a static image paired with audio necessarily requires captioning, even if the audio is prerecorded.

For me, the features of the expected/default player for the media necessarily supporting moving video (or not) is the distinction between AA and AAA.

patrickhlauke commented 5 years ago

I respectfully disagree, as requiring an audio-only stream to have a visual aspect (i.e., captions) is more onerous than requiring a video+audio stream to have certain conditions on its visual aspect (i.e., captions)

then we'll agree to disagree (the captions don't necessarily need a "video" visual aspect. they can be sent as a separate synchronised text stream that's presented client side, or similar technologies. and yes, because the aim is to help users who cannot hear the audio, it will end up having to have a "visual aspect".

I am okay with that

And I am not, as it has nothing to do with what the purpose of the captions actually is. Somebody who's deaf/hard of hearing won't be happy to accept "oh, it's an audio-only live stream, of course I shouldn't expect some form of captions or synchronised text" versus "they also show a video of the studio, how dare they not also adding captions?"

Some not-quite-the-web-but-soon examples: VOIP phones

versus Skype etc video phones?

Amazon Echo

as the device itself has no display, of course it wouldn't be applicable to say it needs captions...or am I missing something?

I am not convinced that a static image paired with audio necessarily requires captioning, even if the audio is prerecorded.

Static as in the camera is fixed, and it just shows me sitting at my desk. Not a still image. (yes, if it were truly just a still image, and we were talking about prerecorded, the audio wouldn't be "synchronized" to it, so would be exempt from falling under SC 1.2.2 Captions)

patrickhlauke commented 5 years ago

from the original discussion on the mailing list, one of the reasons why live audio only captions were set to AAA and live audio in sync'd media (read video, for instance) was set to AA was that at the time there was a large amount of live audio only, so would have been too big an ask to set to AA. i'd argue that nowadays, there's just as much live audio+video being produced on the web (not by big corporations with lots of resources, but even small individuals have now access to easy live video+audio streaming)...so the onus of having this at AA is too high. not saying live audio captions should be moved to AA ... on the contrary, saying live audio+video should be moved to AAA.

awkawk commented 5 years ago

@johnfoliot

(or expanded to "video and other multi-media sources")?

What is missing from the current definition of video? https://www.w3.org/TR/WCAG21/#dfn-video

Video: the technology of moving or sequenced pictures or images NOTE Video can be made up of animated or photographic images, or both.

bruce-usab commented 5 years ago

then we'll agree to disagree

Nothing wrong with that!

it will end up having to have a "visual aspect"

Agreed, but requiring certain audio-only media to have a visual component would be a fundamental alteration. For example, I think nowadays .MP3 can be used for live streaming audio, and while it provides for descriptive identification (in text), I am pretty sure the format does not support streaming text.

Somebody who's deaf/hard of hearing won't be happy to accept "oh, it's an audio-only live stream, of course I shouldn't expect some form of captions or synchronised text" versus "they also show a video of the studio, how dare they not also adding captions?"

This is a good example of why this is something for Silver, since this is a fair observation from the broad perspective of the user. With 2.x, we are drafting requirements from a pretty narrow perspective of the content. I also have the advantage of being a situation where even if this kind of inequity is not covered by 508/WCAG, it is covered by other laws (504/ADA).

versus Skype etc video phones?

I am thinking about the (audio-only) media used by a typical office desk phone, so no support for video.

as the device itself has no display, of course it wouldn't be applicable to say it needs captions

We are writing requirements for the content. So how could audio-only web content fed to an Echo (or a desk phone) conform? How would you phrase that sort of exception?

Static as in the camera is fixed, and it just shows me sitting at my desk.

Okay, but that is still video. We do have some exceptions written for web cams delivering sensory experiences, so there might be that sort of loophole.

bruce-usab commented 5 years ago

on the contrary, saying live audio+video should be moved to AAA.

👎 That's breaking backwards-compatibility.

I agree that audio+video is now cheap and easy, and that captioning is pretty much just as expensive and difficult as ever. I don't think that reality is enough to justify relaxing accessibility requirements.

patrickhlauke commented 5 years ago

We are writing requirements for the content. So how could audio-only web content fed to an Echo (or a desk phone) conform? How would you phrase that sort of exception?

It depends then what you mean here with the use case of the Echo / content being fed to it. You're talking about the Echo being a use case, but the Echo as a target platform only outputs audio. How is this audio "sent" to the Echo? As a text stream that's then converted live, or as live audio? If the former, it's not "live audio" arguably? And if this is in the context of content that's also presented to other user agents, like browsers, the requirement on the content is still there. Echo seems to be a red herring, as it's a specific user agent? Or are we talking across purposes? IF the content was created specifically for the Echo, and only ever delivered to the echo, the requirement for providing an alternative to the audio would be irrelevant, no?

patrickhlauke commented 5 years ago

👎 That's breaking backwards-compatibility.

To me, this aspect is not as sacrosanct as for others. It's actually hampered us in many ways in looking more fundamentally at certain aspects...

I agree that audio+video is now cheap and easy, and that captioning is pretty much just as expensive and difficult as ever. I don't think that reality is enough to justify relaxing accessibility requirements.

If the original reason for one being AA and the other being AAA was the reality at the time, it's "interesting" then that the reality of 2008 (!) is now still the norm ...

patrickhlauke commented 5 years ago

Agreed, but requiring certain audio-only media to have a visual component would be a fundamental alteration. For example, I think nowadays .MP3 can be used for live streaming audio, and while it provides for descriptive identification (in text), I am pretty sure the format does not support streaming text.

depends how it's delivered. if you're using an <audio> element to present it in your web content, then you can add a <track> with the second separate stream for the captions - the same way that you'd do it for a <video> (as you don't necessarily need/want to make the captions open/burnt into the video part of the audio+video stream itself). it's not about the format itself needing to encapsulate all the streams of information, but that the stuff is presented on the web page together (with an appropriate player).

patrickhlauke commented 5 years ago

We do have some exceptions written for web cams delivering sensory experiences, so there might be that sort of loophole.

not normatively in WCAG 2.1 though, unless i'm missing it

bruce-usab commented 5 years ago

It depends then what you mean here with the use case of the Echo / content being fed to it.

I was thinking of an Echo playing a live audio-only stream. That stream cannot be captioned, but I am okay with that, since this is a AAA requirement.

If the original reason for one being AA and the other being AAA was the reality at the time...

I do not concur that this was the deciding factor for 1.2.9 being AAA.

not normatively in WCAG 2.1 though, unless i'm missing it

I should have been more specific. I am thinking about an unattended traffic/beach/zoo camera. I am of the opinion that this sort of use qualifies for 1.1.1 allowance for descriptive identification (as compared to text equivalence) because it is primarily intended to create a specific sensory experience (i.e., eye-in-the-sky).

patrickhlauke commented 5 years ago

I was thinking of an Echo playing a live audio-only stream. That stream cannot be captioned, but I am okay with that, since this is a AAA requirement.

I'd note that, unless we stretch the definition to breaking point, the audio stream itself, in isolation, is not "web content". it's only when it's presented as part of a web page, in a player in the page, that it is affected by WCAG.

bruce-usab commented 5 years ago

@patrickhlauke, a frequent reason (maybe the most common reason) an SC ended up at AAA instead of AA is because we could think of use cases where meeting the SC was not technically feasible. I think my examples are at least close enough to proving the point that requiring audio-only streaming media to be captioned is something that we should be cautious with.

OTOH, this conversation has reminded me that I find it is a bit odd that no one thinks twice about the web page having the alternative text for an image (as opposed the .jpeg or .gif file somehow containing the descriptive text). But for synchronized media, WCAG has this structure where there is strong implication that the media contains the captioning or AD tracks directly.

With that kind of framing, I think I could be swayed that live audio streams being captioned should be AA. Within the U.S. Federal sphere of influence, we are there already, but that’s because of 504 and ADA, and not 508 and WCAG.

johnfoliot commented 5 years ago

Bruce wrote:

...an SC ended up at AAA instead of AA is because we could think of use cases where meeting the SC was not technically feasible.

While that was certainly one reason, it wasn't the only one. Cost and impact on the content creator was another: if achieving the goal was considered too onerous on the content creator, then that was a considering factor as well. There is no point setting a requirement that most content sites would ignore or fail, and thus fail all of WCAG in the process (given the W3C's conformance model for WCAG).

JF

On Fri, Jun 21, 2019 at 6:29 AM bruce-usab notifications@github.com wrote:

@patrickhlauke https://github.com/patrickhlauke, a frequent reason (maybe the most common reason) an SC ended up at AAA instead of AA is because we could think of use cases where meeting the SC was not technically feasible. I think my examples are at least close enough to proving the point that requiring audio-only streaming media to be captioned is something that we should be cautious with.

OTOH, this conversation has reminded me that I find it is a bit odd that no one thinks twice about the web page having the alternative text for an image (as opposed the .jpeg or .gif file somehow containing the descriptive text). But for synchronized media, WCAG has this structure where there is strong implication that the media contains the captioning or AD tracks directly.

With that kind of framing, I think I could be swayed that live audio streams being captioned should be AA. Within the U.S. Federal sphere of influence, we are there already, but that’s because of 504 and ADA, and not 508 and WCAG.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/w3c/wcag/issues/795?email_source=notifications&email_token=AAJL444MFDP2SCTA5YTGSUTP3S3R3A5CNFSM4HZOJLP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYIGZOY#issuecomment-504392891, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJL447B5K37UWHY2RIOXZTP3S3R3ANCNFSM4HZOJLPQ .

-- ​John Foliot | Principal Accessibility Strategist | W3C AC Representative Deque Systems - Accessibility for Good deque.com

patrickhlauke commented 5 years ago

if achieving the goal was considered too onerous on the content creator, then that was a considering factor as well

interesting then that having to provide live captions for live audio+video wasn't considered too onerous, while live captions for live audio alone was... (and here all i can think of is that back in 2009, it wasn't all that common to provide live audio+video, as it was a right technical rigmarole to do that kind of live multimedia streaming, whereas nowadays the technical hurdle is almost nonexistent and anybody can do it)

patrickhlauke commented 5 years ago

in any case, i suspect that unless it's a large corporate doing the live audio+video streaming, it simply won't be financially viable for "joe public" who just wants to add a twitch stream or something to also go for live captions, so they'll simply ignore/out-of-scope that particular page/part of their site from any AA claim...i know i would

bruce-usab commented 5 years ago

interesting then that having to provide live captions for live audio+video wasn't considered too onerous

Too onerous for A, but not too onerous for AA. I think we hit the right balance. In contrast to AD, captioning is clearly essential. Transcripts are often preferred over AD, whereas almost no one prefers a transcript over captioning.

It boggles my mind that Canada got away with mostly exempting 1.2.5. That would never fly in the U.S. (nor should it).

while live captions for live audio alone was [considered too onerous]

Again, I think this reflects the format shift, and not the work required. In the context of non-web audio-only live media (e.g., radio and telephone conferencing), deafness/HoH is a significant impediment that is only imperfectly addressed. Given this reality, it seems reasonable to me that, in the context web audio-only live media, captioning is a AAA requirement.

bruce-usab commented 5 years ago

@johnfoliot wrote:

While that was certainly one reason, it wasn't the only one.

I did not imply it was the only one. Please see the tally I did a while back.

While we spent a great deal of time and energy discussing cost and level of effort, and that was all certainly worthwhile, it turns out -- in the end -- that an SC being onerous (or not) was not so much of a deciding factor when it came to level assignments (or at least not as much of a factor as I think most people assume). We allowed 5/25 (20%) Level A SC that were not "easy". As one would expect, that ratio does increases for AA (7/13, 54% ) and AAA (19/23, 82%). Intuitively, when I started that tally, I expected numbers like 5%/50%/95%.

The characteristic AA SC have most in common is that they are not essential (11/13, 85%). The characteristic AAA SC have most in common is that they are not possible for all content (20/23, 87%).

alastc commented 5 years ago

I think the multi-media responsibilities would be useful to take as an example for the conformance work in silver, tagging @johnfoliot