Create Audio and video alternatives guideline

WilcoFiers commented 10 months ago

Previous issues: w3c/silver#360, w3c/silver#378, w3c/silver#404, w3c/silver#415, w3c/silver#421, w3c/silver#421 w3c/silver#474

Open questions

Is speaker identification in captions required?
How can the accuracy of captions be assessed / graded? How should different inaccuracies be measured, such as missing or inaccurate words in the transcript, missing audio cues, missing speaker identification, missing punctuations necessary to understand emphasis, sentence ends, sarcasm, etc.
Should closed captions be emphasized over open captions, as closed captions tend to be more accessible? Some benefits include that they may be customizable to a user's preferences, and that it may be possible to route them to a braille display. Perhaps open captions should not even be allowed on new video material?
Should non-essential videos such as advertisement and promotional videos be given the same weight as videos that serve as the primary content?

Finnberrys commented 9 months ago

Speaker identification: I think this should be required, where it possible to identify the speaker. Understanding the conversation or situation is often affected by who said what. If the audience cannot identify the who, it can make it difficult to follow a story or issue. Identifying the speaker can also be critical for people with face-blindness. This affects me. I usually identify a person by their voice. If I couldn't hear (either due to a hearing impairment or being in a situation where my device must be on mute) I would find it very difficult to follow a conversation. If the captions identify the speaker, this would help. For the same reason, I think it would help a blind person (accessing transcript or captions via Braille). WIthout speaker identification, it would be like reading a playscript but with no characters identified.

mraccess77 commented 9 months ago

Whether advertising videos should be not covered could be something left to regulators to exempt as there are clear situations where discrimination can occur when a video is not accessible such as in fair housing or healthcare. On a related note people use to say audio description was not needed on entertainment videos - but now people are starting to realize that entertainment is a right as well - just as product selection and comparison is.

u9000-Nine commented 7 months ago

If the audience cannot identify the who, it can make it difficult to follow a story or issue. Identifying the speaker can also be critical for people with face-blindness.

Of note is that speaker identification based on caption location would not help someone with face-blindness.

u9000-Nine commented 7 months ago

Open captions: Open captions are literally images of text, so they currently fail 1.4.5 Images of text (AA) if they have alt text. If they don't have alt text,* then they also currently fail 1.1.1 Non-text content (A).

* As far as I'm aware, no open captions have alt text because that's not a thing for video frames.

awkawk commented 7 months ago

Open captions: Open captions are literally images of text, so they currently fail 1.4.5 Images of text (AA) if they have alt text. If they don't have alt text,* then they also currently fail 1.1.1 Non-text content (A).

Open captions are part of time-based media so don't need alternative text.

u9000-Nine commented 7 months ago

Open captions are part of time-based media so don't need alternative text.

Only if "text alternatives at least provide descriptive identification of the non-text content. (Refer to Guideline 1.2 for additional requirements for media.)". Guideline 1.2 says the following under "Prerecorded Video-Only":

Either an alternative for time-based media or an audio track is provided that presents equivalent information for prerecorded video-only content.

So open captions do currently pass Level A if there is also a transcript or an audio track in the same language of the captions.

Since captions are explicitly an alternative for audio, I think open-captioned videos should at least be required to have a textual transcript as well in WCAG 3.

(See also #20)

awkawk commented 7 months ago

Open captions are part of time-based media so don't need alternative text.

Only if "text alternatives at least provide descriptive identification of the non-text content. (Refer to Guideline 1.2 for additional requirements for media.)". Guideline 1.2 says the following under "Prerecorded Video-Only":

The video would fail 1.1.1 if there wasn't a text alternative to identify the video, yes. But there is no requirement for the open captions to have "alternative text". The open captions are part of the video, so the alternative for the video is what is required at Level A.

Either an alternative for time-based media or an audio track is provided that presents equivalent information for prerecorded video-only content.

DJ, you mention this in a discussion on open captions, but this is for video-only, so there wouldn't be open captions as there is no audio. Of course, there can be on-screen text, but that is just text on the screen (e.g., titles) but wouldn't be captions as captions (both open and closed) are equivalents for audio information.

So open captions do currently pass Level A if there is also a transcript or an audio track in the same language of the captions.

Open captions pass level A, full stop. Of course there is an audio track that the open captions are the equivalent for - that's why they are captions. The transcript for audio information is not required until 1.2.8.

u9000-Nine commented 7 months ago

Only if "text alternatives at least provide descriptive identification of the non-text content. (Refer to Guideline 1.2 for additional requirements for media.)". Guideline 1.2 says the following under "Prerecorded Video-Only":

The video would fail 1.1.1 if there wasn't a text alternative to identify the video, yes. But there is no requirement for the open captions to have "alternative text". The open captions are part of the video, so the alternative for the video is what is required at Level A.

That is fair.

Either an alternative for time-based media or an audio track is provided that presents equivalent information for prerecorded video-only content.

DJ, you mention this in a discussion on open captions, but this is for video-only, so there wouldn't be open captions as there is no audio. Of course, there can be on-screen text, but that is just text on the screen (e.g., titles) but wouldn't be captions as captions (both open and closed) are equivalents for audio information.

So open captions do currently pass Level A if there is also a transcript or an audio track in the same language of the captions.

Open captions pass level A, full stop. Of course there is an audio track that the open captions are the equivalent for - that's why they are captions. The transcript for audio information is not required until 1.2.8.

I have seen videos with open captions and nonaural audio before. Usually they are instructional videos with some royalty-free music in the background. I agree that open captions currently pass when there is also aural audio, but I know that there are videos which don't have such audio. In those situations, the information provided by the captions is effectively video only time-based media, so they would not currently conform.

That is besides my main point though, which is that I think open-captioned videos should at least be required to have a textual transcript as well in WCAG 3.

Apologies, I should have explicitly mentioned captioned videos with music-only audio tracks. I incorrectly assumed that everyone was familiar with those

awkawk commented 7 months ago

I have seen videos with open captions and nonaural audio before. Usually they are instructional videos with some royalty-free music in the background. I agree that open captions currently pass when there is also aural audio, but I know that there are videos which don't have such audio. In those situations, the information provided by the captions is effectively video only time-based media, so they would not currently conform.

I'm confused about "aural audio" - aural refers to sounds heard through the ear, so what other kind of audio are you thinking of?

If a video doesn't have audio then it wouldn't need to have captions (although a "(no audio)" caption at the start is good to have. I'm confused about what you are saying wouldn't conform in a video-only file that had open captions. In my example above, if I provide a single caption to indicate that there is no audio, you're saying that triggers a failure?

That is besides my main point though, which is that I think open-captioned videos should at least be required to have a textual transcript as well in WCAG 3.

OK, that is clear enough.

Apologies, I should have explicitly mentioned captioned videos with music-only audio tracks. I incorrectly assumed that everyone was familiar with those

You're talking about a video with music only and the captions are just musical notes to represent the non-spoken content? If so, sure I'm familiar with that. If not, I'm not sure what you mean.

u9000-Nine commented 7 months ago

I'm confused about "aural audio" - aural refers to sounds heard through the ear, so what other kind of audio are you thinking of?

My apologies. I meant "oral audio", but I should have just wrote "spoken audio".

(Plain Language is hard for me in English so I sometimes mess it up. I am committing to simplifying my language when possible, though, and am always willing to explain of course. Thank you for asking for clarification. 🙂)

Finnberrys commented 7 months ago

If the audience cannot identify the who, it can make it difficult to follow a story or issue. Identifying the speaker can also be critical for people with face-blindness.

Of note is that speaker identification based on caption location would not help someone with face-blindness.

I find it helpful because then I know who is speaking. It reminds me who is who. Sometimes people or characters look very similar and I can't tell them apart. Having the captions identify them with a name helps me to know who it is that is speaking. It's similar to being in a Teams meeting as opposed to an in person meeting. On Teams, everybody has their name visible, which helps me to identify who is speaking.

w3c / wcag3

Create Audio and video alternatives guideline #23

Open questions