w3c / dapt

Dubbing and Audio description Profiles of TTML2
https://w3c.github.io/dapt/
Other
6 stars 3 forks source link

How would extended AD be supported? #71

Open nigelmegitt opened 2 years ago

nigelmegitt commented 2 years ago
    @nigelmegitt - audio in the case of dub/vo, but for AD, the video is the important association!

Not sure if you have considered 'Extended AD' - where the AD presentation of the original media may cause the original media to pause at certain points for the AD to complete before playing again. https://www.w3.org/TR/WCAG20-TECHS/G8.html#:~:text=Extended%20audio%20description%20temporarily%20pauses,are%20insufficient%20for%20adequate%20description.

If not, then it's probably a DAPT wishlist item, as I'm not aware of good examples of preparing such a thing at the moment. It's not something we've implemented yet - A method of representing it is evading me so far (although I've not thought too hard yet).

Originally posted by @btsimonh in https://github.com/w3c/dapt/issues/45#issuecomment-1250026733

nigelmegitt commented 2 years ago

My thoughts on this:

The extension that Extended AD gives relative to "ordinary" AD is that it allows the audio description resource to be longer than the period within the related media that is set aside for playback of that resource.

In other words, within, say, a 2s period of media there is a 5s duration description.

For Text to Speech applications, it is very often not known at authoring time exactly how long the speech will last.

For pre-recorded audio, the duration of each description is known exactly, after it has been recorded.

The current semantics of TTML2 playback of audio is that when the end time of the <span> (for TTS) or <audio> element is reached, playback is stopped. This is independent of how far through playback of its audio resource it has reached: if it has reached the end, fine; if it has not, it will be truncated.

From a player perspective, it is not obvious whether any truncation is intended or accidental, so we cannot just set a playback flag saying "until audio playback is complete, pause media timeline" because for AD files authored that rely on the truncation behaviour will be undesirable. This could be discouraged of course.

It probably makes sense then to add some syntax whose semantic is:

  1. "the actual duration of this audio element needs to be extended to [duration]" (comes into play when the "opportunity" time, i.e. the active duration of the <audio> element is less than duration) and
  2. (possibly) "if you have to pause the media, do it at time M".

Then it would be a player flag that says "honour extended durations" that would do so by pausing for max(duration - opportunity, 0) at either: 1. the defined pause time M or 2. some implementation-defined time.

For text to speech playback, the pause time would probably need to be at the end of the "opportunity" window, while the player waits until the text to speech system fires a "completed utterance" event.

If we're adding syntax and semantics, it would be ideal to do that in TTML2, but it may be more practicable from a standardisation perspective to do it in DAPT directly.

nigelmegitt commented 1 year ago

@btsimonh I've proposed in #118 informative text that says implementations can support extended descriptions by varying the play rate of the audio description audio or of the programme audio so that there is enough time to play all the audio description within the time interval allowed. But that it's implementation-defined behaviour and not specified.

e.g. player could:

This stops short of any syntax but hopefully shows a route forward, and if we find there is a practical requirement to add more syntax or semantics later, then we still can.