Support for Karaoke - Githubissues

cconcolato commented 5 years ago

Context

Karaoke is a well-known timed text application: song lyrics are displayed on top of a corresponding video clip, with timed emphasis on the words or characters to indicate to the viewer which words/characters have been sung, are being sung or will be sung.

The emphasis can be of different types: using a different text color or placing a (bouncing) dot or image on the current word being sung. The emphasis can be constant or be continuous with transitions within a word or from word to word. The emphasis behavior and style is constant within a document.

Additional transitions can be applied before/after/between timed text events.

Examples of Karaoke can be found on YouTube where the text is burned in the video: Moana, or Frozen.

Proposed Requirements

We propose two types of requirements:

requirements to add more semantics in the document, that can be processed independently of the styles, which may or may not be present the document.
requirements on how styles can be applied for rendering on clients.

These requirements are applicable to TTML and IMSC.

R1: It shall be possible to associate time values with words/spans of characters of a timed text event without affecting their visibility (as would be done with a begin attribute on a ‘p’ or ‘span’ element). Such time values are hereafter called inner-event time values.

R2: It shall be possible to indicate the beginning and end of a karaoke section within a timed text document that also conveys non-karaoke content before and/or after the karaoke content.

R3: It shall be possible to indicate transitions between timed text events.

R4: It shall be possible, at the document header level, to associate karaoke style changes with inner-event time values.

R5: It shall be possible to let the presentation processor apply its own style and behavior (including transitions) across multiple karaoke documents (e.g. for consistency of the user experience across documents).

R6: It should be possible to associate transition styles at the beginning/end or between certain events.

Possible solutions

All “possible solutions” proposed in here are initial ideas towards a solution and not full-fledged solutions. It is expected that they will be changed by the TTWG.

3.1. Without animation elements

Defining timing could be done for example with a new element (in the same or different namespace, or using ttm:item or else):

A simple case for single timed text event could be as follows (space added for readability):

<p begin=”0s” end=”12s”>
This text<marker begin=”2s” type=“karaoke”/>
uses inner timing<marker begin=”4s” type=“karaoke”/>
which can be used<marker begin=”6s” type=“karaoke”/>
for various applications<marker begin=”8s” type=“karaoke”/>
including karaoke</p>

A more complex case (karaoke within non-karaoke content and with event transitions) could be:

<p begin="W" end="X">Non karaoke text</p>
<p begin="A" end="B">
<marker type="karaoke-start"/>
Bla bla<marker begin=”XXms” type=“karaoke”/>
Bla bla<marker begin=”YYms” type=“karaoke”/>
Bla bla<marker type="karaoke-short-transition" begin="XXms"/>
</p>
<p begin="C" end="D">
Sing sing<marker begin=”XXms” type=“karaoke”/>
Oh oh oh<marker begin=”YYms” type=“karaoke”/>
Song Song<marker type="karaoke-short-transition" begin="XXms"/>
</p>
<p begin="E" end="F">
Ti ti ti<marker begin=”XXms” type=“karaoke”/>
Ta ta ta<marker begin=”YYms” type=“karaoke”/>
hi hi hi<marker begin=”ZZms” type=“karaoke”/>
ha ha ha<marker type="karaoke-long-transition" begin="XXms"/>
</p>
<p begin="G" end="H">
Di di di<marker begin=”XXms” type=“karaoke”/>
Da da da<marker begin=”YYms” type=“karaoke”/>
Do do do<marker type="karaoke-end">
</p>
<p begin="Y" end="Z">Non karaoke text</p>

The association of a presentation-processor-specific behavior to the markers for karaoke could be simply done based on the type attribute, which says karaoke in this case.

The association of document-specific styles to the markers could also be done via the style element:

<head>
  <styling>
    <style xml:id=“karaoke” 
                 tts:karaoke-type=“discrete | continuous”
                 tts:karaoke-emphasis=“...”/>
  </styling>
</head>

where karaoke-emphasis would be based on text-emphasis and extended with the ability to:

use a PNG/SVG image for the emphasis-style
indicate transition styles (within words, between words, between events)

3.2. using (for discrete animations)

Defining timing could be done as follows:

<p begin=”0s” end=”10s”>
<span>This text<set style=”karaoke” begin=”0s”/></span>
<span> uses set elements<set style=”karaoke” begin=”2s”/></span>
<span>inside spans<set style=”karaoke” begin=”4s”/></span>
<span>to trigger<set style=”karaoke” begin=”6s”/></span>
<span>the animations <set style=”karaoke” begin=”8s”/></span>
<span>using global styles<set style=”karaoke” begin=”10s”/></span>
</p>

The same style definitions as in the previous case could be used. Transitions would have to be added.

css-meeting-bot commented 5 years ago

The Timed Text Working Group just discussed Support for Karaoke tt-reqs#9, and agreed to the following:

RESOLUTION: We will take these requirements forward for our 2019 work.

The full IRC log of that discussion

<nigel> Topic: Support for Karaoke tt-reqs#9
<mike> sorry to interrupt but the webex and phone coordinates aren't working for me
<nigel> github: https://github.com/w3c/tt-reqs/issues/9
<nigel> s/sorry to interrupt but the webex and phone coordinates aren't working for me//
<nigel> Nigel: This is closely related to some of the other issues we discussed earlier, including granular timing and transitions.
<nigel> Cyril: The use cases are quite different. It is not about responsiveness but transition styles, not about
<nigel> .. changing the layout. We would like to signal the semantics of karaoke separately from the styling.
<nigel> .. For example ingesting IMSC content with karaoke styling and then we would decide what karaoke means
<nigel> .. and apply styling ourselves, or in the document.
<nigel> Glenn: Without agreeing with the specific requirements or proposed solutions I'm in favour of moving
<nigel> .. forward with some kind of requirement for supporting karaoke. I need to digest this a little more and I
<nigel> .. also want to look at what ARIB-TT did because they defined some support for karaoke that I haven't
<nigel> .. looked at in detail. It is something to map to a module if possible.
<nigel> Nigel: I see a lot of overlap between different requirements today, for example the text emphasis style
<nigel> .. seems to have something in common with the inline image requirement.
<nigel> Pierre: Emphasis style allows a quoted string so you can have the emphasis be whatever you want.
<nigel> Cyril: Okay that's a good solution.
<nigel> s/good/potential good
<nigel> Glenn: It could conceivably even be an animated glyph because you can use SVG animation in an embedded font.
<nigel> Cyril: I'm interested in animation between words not within the glyph.
<nigel> Nigel: Isn't this a completely new layout requirement for animating a moving ball between words?
<nigel> Vlad: It is and strictly animations are not permitted in SVG glyphs.
<nigel> Cyril: I am fine with this in a TTML module but what about IMSC, maybe a karaoke profile?
<nigel> Pierre: One data point supporting getting it into IMSC at some point is that IMF supports karaoke without
<nigel> .. this key functionality. I suggest doing the TTML3 module first and then if there is industry support
<nigel> .. adding it into IMSC as a small change.
<nigel> .. Then if it doesn't have to be part of IMSC by end of 2019 that makes it a lot easier.
<nigel> Cyril: Instead of IMSC 2019 being published on the same date as TTML3 then we could pipeline them and
<nigel> .. that would make it easier.
<Vlad> SVG glyph limitations as defined by the OpenType spec: https://docs.microsoft.com/en-us/typography/opentype/spec/svg#svg-documents
<nigel> Pierre: We said we would try to publish all new specifications by end of 2019 and pick requirements based on that target.
<nigel> Glenn: To raise the modularisation approach we should be motivated to minimise what we have to do in a TTML3 baseline document
<nigel> .. so we get it out the door more quickly than the modules we define functionality in, by focusing on the
<nigel> .. framework for modularisation as the key thing in TTML, then focus in parallel the modules that take
<atai1> q+
<nigel> .. advantage of that framework. Then we decouple to a certain extent getting IMSC to a particular gate.
<nigel> Pierre: Makes sense to me. If we think we get it done by June in TTML3 then the feature in IMSC by the end
<nigel> .. of the year is feasible.
<nigel> Andreas: Is this issue accepted?
<nigel> PROPOSAL: Take up the karaoke requirements in 2019
<nigel> Glenn: There are a bunch of requirements, 6 of them, I don't know if I would agree to all those at this time
<nigel> .. but I think we should move those forward.
<nigel> RESOLUTION: We will take these requirements forward for our 2019 work.

hober commented 5 years ago

Have you seen Microsoft's proposed Highlight API? It seems relevant.

https://github.com/MicrosoftEdge/MSEdgeExplainers/blob/master/highlight/explainer.md

lborgman commented 5 years ago

Please add the tag "accessibility" (if there is such a tag). This can help hearing impaired people.

skynavga commented 5 years ago

@lborgman please describe what you mean by an "accessibility" "tag"? there are already many accessibility features in TTML, so please indicate what you are asking for that is not already supported in TTML

w3c / tt-reqs

Support for Karaoke #9