Improve TTML caption rendering

palemieux commented 6 years ago

The captions in the following MPD should be presented in a region that starts 30% for the left and ends 10% to the right, and the font size should be 1/30 of the height of the video.

https://palemieux.com/public/foms2017/CEP150_512kb.mpd

Happy to provide additional information.

P.S.: consider using the W3C IMSC1 test suite to validate IMSC1 rendering, or using the imscJS polyfill, which is also used by dash.js.

joeyparrish commented 6 years ago

Text display is currently handled by the browser or the application. Browsers do not currently support regions, but we do parse them and make the information available to text display plugins.

Adding this to the backlog. We will consider a complete DOM-based implementation of TextDisplayer.

joeyparrish commented 6 years ago

In the mean time, feel free to write your own plugin for this. We would be happy to review a pull request if you decide to work on this.

palemieux commented 6 years ago

@joeyparrish Thanks for the feedback. Is the following possible for a plug-in to:

receive the media sample, i.e. TTML document, and a target rendering DOM element (e.g. div overlaying the video object)?
return the times at which it would like to be called back, and opaque state information to be passed at each call back?

At each callback, the TTML plugin would draw the subtitle/captions in the target rendering DOM element.

joeyparrish commented 6 years ago

It's not a TTML plugin, but rather a text display plugin. It receivers cues which it is responsible for rendering. Here's the interface documentation:

https://shaka-player-demo.appspot.com/docs/api/shakaExtern.TextDisplayer.html

Here's the documentation for the cue objects:

https://shaka-player-demo.appspot.com/docs/api/shaka.text.Cue.html

This system is more general than TTML. It will be used by the player for all subtitle/caption rendering, regardless of the input format.

You can pass whatever state you want into the constructor or into additional methods you have on your class.

palemieux commented 6 years ago

Ok. Thanks for the details. As I understand it,

the media track is parsed and Text Cue instances are generated
these Text Cue instances are passed to a Text Displayer plugin for rendering

Is that right? If so, what happens if the current Text Cue interface does not support the full range of capabilities offered by TTML? Can the media track parser return any object as long as it implements the Text Cue interface? Will the full object be passed back to the plugin?

I am thinking the media track parser could return a TTML cue that implements the current interface (for minimal compatibility) but also contains the information (as private members) needed to fully render the TTML cue. Would that work?

Thanks for your help. Happy to continue the discussion offline. Feel free to DM me at pal@sandflow.com.

joeyparrish commented 6 years ago

The shaka.text.Cue class is owned by us, and is independent of the browser. If it is missing some TTML feature you need, we can extend it. We intend it to be generic enough to support both TTML and WebVTT.

The exact Cue object (as output by the TTML parser) will be sent to the TextDisplayer plugin. If there is something missing in that object, we would be happy to take either a feature request or pull request to add fields to Cue and add parser support for them in the TTML parser.

palemieux commented 6 years ago

TTML supports capabilities beyond what shaka.text.Cue supports, including:

style variations within a cue (color, text decoration, font weight, etc.)
position of the cue within the video
etc...

Couple of options:

extend shaka.text.Cue to support all these variations
allow shaka.text.Cue to carry an HTML fragment that would be generated by the TTML parser
allow shaka.text.Cue to carry a TTML-specific object that would be generated by the TTML parser, and be available to the TTML rendering plugin

imscJS can readily support the second and third options.

Thoughts?

nigelmegitt commented 6 years ago

Makes sense to allow the Cue object to support richer definition of payload content than it does now. Maybe even by allowing Cue.payload to be a Cue itself, or if you don't want to support infinite nesting, then by including some kind of markup within the payload that would override the appropriate properties like fontStyle, fontWeight, color etc. for a subsection of text within them.

Some approach like that will be required to support TTML properly since it allows style attributes to be specified on spans within what are considered "cues" here.

joeyparrish commented 6 years ago

First, let me address using imscJS for TTML support:

Our current TTML parser, limited though it may be, compiles to about 7,939 bytes. imscJS + its sax dependency is 159,787 bytes. That's a 20x increase in size for the TTML parser itself. Shaka Player as a whole is currently 185,141 bytes (in my working directory, anyway), so adding imscJS + sax would be an 86% increase in the size of Shaka Player itself.

That is way too big for a built-in subtitle parser. If you want to use imscJS to handle TTML, you can always have a text-parsing plugin at the application level which replaces our default TTML parser. See the API docs on TextParser if you want to pursue that: https://shaka-player-demo.appspot.com/docs/api/shakaExtern.TextParser.html

As for enriching our existing TTML parser, I'm completely open to that. But since it sounds like our Cue interface is not up to the task of full TTML support, it will need to be redesigned. If you want to contribute in this area, I would prefer to get a design proposal for Cue and iterate on that before you make a complete pull request. If you don't want to do this, my team will look into it once we have taken care of some higher-priority features. For now, this is still in the "backlog" milestone for us.

Thanks!

palemieux commented 6 years ago

That is way too big for a built-in subtitle parser. If you want to use imscJS to handle TTML, you can always have a text-parsing plugin at the application level which replaces our default TTML parser.

Seems a reasonable approach: have the application provide text parser and cue renderer implementations, e.g. using imscJS. Perhaps a simple example is all that is required.

palemieux commented 6 years ago

@joeyparrish P.S.: I very much appreciate your taking the time to explore these various options.

joeyparrish commented 2 years ago

This issue is quite old, and our TTML and rendering capabilities have grown a lot. If there are still gaps in our support of TTML parsing or rendering, please feel free to open new issues for them. Thanks!

shaka-project / shaka-player

Improve TTML caption rendering #1080