w3c / adpt

Audio Description Profile of TTML2
https://w3c.github.io/adpt/
Other
5 stars 7 forks source link

Constrain to one leaf element being active for audio at any one time? #8

Closed nigelmegitt closed 1 year ago

nigelmegitt commented 5 years ago

Currently there is no constraint on the number of leaf elements whose audio can be active simultaneously. This could cause implementation difficulties if connecting a large number of audio routes up. For the audio description use case I'm not aware of any need to have more than one description being played at any one time, so we should consider adding this as a constraint.

For other use cases of similar profiles (object based audio, perhaps) the constraint would not be applicable.

simpson-matt commented 5 years ago

I think there could be fringe / future use cases where this may be handy, but they would probably require further definition than simply allowing an overlap here. I think a constraint here is not likely to cause any major issues at the moment - a well constructed AD file should not need this kind of overlap.

nigelmegitt commented 5 years ago

From @simpson-matt 's comment and from the support during the call today, I'm declaring we have consensus to apply this constraint and will go ahead and create a pull request for it.

btsimonh commented 3 years ago

see use case highlighted here: https://github.com/w3c/adpt/issues/22 where a single AD element is constructed from two or more audio segments, which may or may not be from the same audio file.

But, also be aware that this could add the further complexity of multiple background audios appearing during the overlap if the TTML is not structured correctly.

But whilst writing this, I realise that we could constrain "the number of leaf elements whose audio can be active simultaneously" (at least for this AD use case) - e.g. to use multiple simultaneous audio files does not break this constraint if they share a parent:

      <p xml:id="ad31b" begin="30s" end="40s">
        <animate begin="0.0s" end="0.12s" tta:gain="1;0.39"/>
        <animate begin="9.8s" end="10s" tta:gain="0.39;1"/>
        <span>
          <audio src="DRAD182Y01.wav" clipBegin="35.68s" clipEnd="37.72s" begin="0.12s" end="2.16s">
            <animate begin="2.0s" end="2.04s" tta:gain="1;0"/>
          </audio>
          <audio src="DRAD182Y01.wav" clipBegin="35.68s" clipEnd="37.72s" begin="2.1s" end="5s">
            <animate begin="0s" end="0.04s" tta:gain="0;1"/>
          </audio>
          Nick takes a drag of his cigarette. (twice)</span>
      </p>

@nigelmegitt - is the above valid? (p.s. not sure on the internal animate times - whether they are relative to the audio begin or the p begin, but you get the idea)

We do have more complex use cases, but these get very messy very quickly. Certainly one (non-AD) use case for multiple leaves is where you have a static leaf with gain 1 valid for the complete duration, and all audios are within a separate branch with gain 0, so that those dynamic leaves don't receive any original audio, simplifying overlaps. br, simon hailes

nigelmegitt commented 3 years ago

Thanks for this @btsimonh .

I think your example is showing a single leaf span that plays two audio clips, one beginning at 0.12s and ending at 2.16s (relative to the span's begin time, which in this case is 30s), the other from [2.1s, 5s). In this particular case the two audio clips are the same, coming from the same resource and having the same clipBegin and clipEnd, but it demonstrates the point.

I agree that in this case it is possible to play two audio clips, one after the other, without breaking the constraint of "1 active leaf node at a time".

a static leaf with gain 1 valid for the complete duration, and all audios are within a separate branch with gain 0, so that those dynamic leaves don't receive any original audio, simplifying overlaps.

The following two are functionally identical:

<body>
   <div tta:gain="1"><p xml:id="originalAudio"></p></div>
   <div tta:gain="0"><p xml:id="otherAudio"><audio src="otherAudio.wav" tta:gain="0.7"/></p></div>
</body>

and

<body>
   <div tta:gain="1"><p xml:id="otherAudio"><audio src="otherAudio.wav" tta:gain="0.7"/></p></div>
</body>

so I'm not sure what benefit there would be for the first pattern compared to the second?

btsimonh commented 3 years ago

so I'm not sure what benefit there would be for the first pattern compared to the second?

consider:

<body>
   <div tta:gain="1"><p xml:id="originalAudio"></p></div>
   <div tta:gain="0">
    <p xml:id="otherAudio1" begin="1s" end="5s"><audio src="otherAudio1.wav" tta:gain="0.7"/></p>
    <p xml:id="otherAudio2" begin="3s" end="7s"><audio src="otherAudio2.wav" tta:gain="0.7"/></p>
   </div>
</body>

0-1s -> background x 1. 1-3s -> background x 1 + otherAudio1.wav x 0.7 3-5s -> background x 1 + otherAudio1.wav x 0.7 + otherAudio2.wav x 0.7 5-7s -> background x 1 + otherAudio2.wav x 0.7

and

<body>
   <div tta:gain="1">
    <p xml:id="otherAudio1" begin="1s" end="5s"><audio src="otherAudio1.wav" tta:gain="0.7"/></p>
    <p xml:id="otherAudio2" begin="3s" end="7s"><audio src="otherAudio2.wav" tta:gain="0.7"/></p>
   </div>
</body>

0-1s -> background x 1. 1-3s -> background x 1 + otherAudio1.wav x 0.7 3-5s -> background x 1 + otherAudio1.wav x 0.7 + background x 1 + otherAudio2.wav x 0.7 5-7s -> background x 1 + otherAudio2.wav x 0.7

by adding a second timed audio with overlap, it's a very different (and undesirable) output.

BUT...

<body>
   <div tta:gain="1">
    <p>
      <audio  xml:id="otherAudio1" begin="1s" end="5s" src="otherAudio1.wav" tta:gain="0.7"/>
      <audio  xml:id="otherAudio2" begin="3s" end="7s" src="otherAudio2.wav" tta:gain="0.7"/>
    </p>
   </div>
</body>

is also a valid construct? and the overlap works. And it has only one active leaf... (my biggest problem with TTML - so many ways to say the same thing! - sometimes not obvious to an implementor)

One big question we have to ask is what constraints we should apply to avoid blowing up presentation engines. e.g. this is also valid?:

<body>
   <div tta:gain="1">
        <animate begin="0.0s" end="0.12s" tta:gain="1;0.39"/>
        <animate begin="9.8s" end="10s" tta:gain="0.39;1"/>
    <p>
      <audio  xml:id="otherAudio1" begin="1s" end="5s" src="otherAudio1.wav" tta:gain="0.7"/>
      <audio  xml:id="otherAudio2" begin="3s" end="7s" src="otherAudio2.wav" tta:gain="0.7"/>
    </p>
   </div>
</body>

and if there were 2000 animates and 1000 audios, is it 'reasonable'?

but we may be far off this topic. Does it seem reasonable to restrict to a single leaf having audio output? Maybe, if we can agree on recommendations to achieve playback of overlapping audios which can work with a single leaf. And define what happens if there ARE multiple leaves (even if it's 'presentation engine dependant', recommend 'first leaf found plays'). But, if multiple audios are allowed inside a single leaf, does that relieve the original concern about complex audio graphs? Actually, maybe it does simplify some aspects of implementation, and forces people to think about avoiding the 'double original audio issue'. Am I right to think that 'Single leaf' -> 'Single original'? If so, then the restriction has my vote, as I think this is a very important anomaly to avoid (until a use case is found that can't be supported and is important!).

br, simon

nigelmegitt commented 3 years ago

Am I right to think that 'Single leaf' -> 'Single original'?

Yes, by definition, it means there's only one active route through the graph/tree to an output node.

If so, then the restriction has my vote, as I think this is a very important anomaly to avoid (until a use case is found that can't be supported and is important!)

Agreed

and if there were 2000 animates and 1000 audios, is it 'reasonable'?

This points to a whole other topic that should be independent of the syntax, given, as you note, the possibility of generating the same output via different arrangements of inputs. In IMSC there's the HRM (hypothetical render model) that attempts to quantify the presentational complexity as might be experienced by a player. We don't have anything like that in ADPT, and generating one would be a lot of work.

I'm inclined to proceed without one and if real world content presents playback issues, deal with them as we can. If you think we need document constraints to ensure player performance success, please open that as a separate issue.

While I'm touching on this, I should say that those performance issues may be generated by input document complexity, but they may also be generated by dependency on features that are notoriously awkward for players. For example if you try to seek accurately into an MP3 audio file, the results will be very different to doing the same with a WAV file: the time it takes the player to arrive at the seek point will be different, and the actual arrived-at seek point will be different too. I've seen these effects in Adhere, and they're not pleasant.

I'm sure there are implementation techniques for working around these issues, but "out of the box" audio presentation in browsers isn't necessarily optimised for those use cases.

btsimonh commented 3 years ago

I do like IMSC because Pierre tried to quantify the render issues - a bit like the DVB decoder models. It just shows people what may be involved, and why they should be mindful. Yes, making an HRM for this application may feel complex and a lot of work, but the first bare bones of it would be less difficult, and would in a few paragraphs give an indication to an implementer of the complexity and pitfalls that we have imagined (even if it's not a rigorous analysis). Let me think on it.

nigelmegitt commented 3 years ago

Let me think on it.

👍🏻

I do like IMSC because Pierre tried to quantify the render issues

Just a historical note that the provenance of IMSC was from a member submission of CFF-TT, with work done by various folk, which was then subjected to W3C's processes and various other amendments before becoming IMSC. It's not only the work of the Editor, though his contribution has been huge!

nigelmegitt commented 1 year ago

Closing since work on this has moved to DAPT, but not losing this: opened w3c/dapt#170 to discuss constraints on audio resource playback within DAPT.