w3c / dapt

Dubbing and Audio description Profiles of TTML2
https://w3c.github.io/dapt/
Other
7 stars 3 forks source link

Support both referenced and inline embedded audio recordings? #115

Open nigelmegitt opened 1 year ago

nigelmegitt commented 1 year ago
          While working on the specification for adding audio recordings I reminded myself of the various ways in which an audio recording can be embedded and referenced, of which there are at least 5 in total. Requirement R15 of [DAPT](https://www.w3.org/TR/dapt-reqs/#requirements) is clear that both referenced and embedded options need to be available, but should we be syntactically restricting the options for each? Will raise as separate issues.

Originally posted by @nigelmegitt in https://github.com/w3c/dapt/issues/105#issuecomment-1470390924

If we are going to support embedded audio resources, they can either be defined in /tt/head/resources and then referenced, or the data can be included inline.

Do we need both options?

Example of embedded:

<head>
  <resources>
    <audio xml:id="audioRecording1" type="audio/wave">
      <source>
        <data>[base64 encoded audio data]</data>
      </source>
    </audio>
    <data xml:id="audioRecording2" type="audio/wave">
      [base64 encoded audio data]
    </data>
  </resources>
</head>

This would then be referenced in the body content using something like (see also #114):

<audio src="#audioRecording2"/>

Example of inline:

<audio type="audio/wave">
  <source type="audio/wave">
    <data>[base64 encoded audio data]</data>
  </source>
</audio>
mattsimpson3 commented 1 year ago

On this thread of issues I think support for embedded recordings would be extremely useful - it avoids the need for zipped / archived side-car transportation of a string of tiny audio files. As for inline or referenced - my gut feeling is that referenced gives more scope for compression (in the unlikely event that the same audio is used more than once) - but this feels an uncommon use case.

nigelmegitt commented 1 year ago

This made me think about performance issues. I think that in the server-side or authoring domain parsing and loading performance is unlikely to be a significant factor, but in a distribution/client playback scenario having a bunch of big audio resources embedded in the head of a document will mean that the parser has to get past all of those to get to the timed text data that might be the most important thing.

Parsers that wait until they've parsed the whole document won't care where within the document the embedded data is though.

mattsimpson3 commented 1 year ago

I did have a similar if not so eloquent thought; it also makes any packetisation of the stream simpler for distribution.