shaka-project / shaka-player

JavaScript player library / DASH & HLS client / MSE-EME player
Apache License 2.0
7.1k stars 1.33k forks source link

Improve TTML caption rendering #1080

Closed palemieux closed 2 years ago

palemieux commented 6 years ago

The captions in the following MPD should be presented in a region that starts 30% for the left and ends 10% to the right, and the font size should be 1/30 of the height of the video.

https://palemieux.com/public/foms2017/CEP150_512kb.mpd

Happy to provide additional information.

P.S.: consider using the W3C IMSC1 test suite to validate IMSC1 rendering, or using the imscJS polyfill, which is also used by dash.js.

joeyparrish commented 6 years ago

Text display is currently handled by the browser or the application. Browsers do not currently support regions, but we do parse them and make the information available to text display plugins.

Adding this to the backlog. We will consider a complete DOM-based implementation of TextDisplayer.

joeyparrish commented 6 years ago

In the mean time, feel free to write your own plugin for this. We would be happy to review a pull request if you decide to work on this.

palemieux commented 6 years ago

@joeyparrish Thanks for the feedback. Is the following possible for a plug-in to:

At each callback, the TTML plugin would draw the subtitle/captions in the target rendering DOM element.

joeyparrish commented 6 years ago

It's not a TTML plugin, but rather a text display plugin. It receivers cues which it is responsible for rendering. Here's the interface documentation:

https://shaka-player-demo.appspot.com/docs/api/shakaExtern.TextDisplayer.html

Here's the documentation for the cue objects:

https://shaka-player-demo.appspot.com/docs/api/shaka.text.Cue.html

This system is more general than TTML. It will be used by the player for all subtitle/caption rendering, regardless of the input format.

You can pass whatever state you want into the constructor or into additional methods you have on your class.

palemieux commented 6 years ago

Ok. Thanks for the details. As I understand it,

Is that right? If so, what happens if the current Text Cue interface does not support the full range of capabilities offered by TTML? Can the media track parser return any object as long as it implements the Text Cue interface? Will the full object be passed back to the plugin?

I am thinking the media track parser could return a TTML cue that implements the current interface (for minimal compatibility) but also contains the information (as private members) needed to fully render the TTML cue. Would that work?

Thanks for your help. Happy to continue the discussion offline. Feel free to DM me at pal@sandflow.com.

joeyparrish commented 6 years ago

The shaka.text.Cue class is owned by us, and is independent of the browser. If it is missing some TTML feature you need, we can extend it. We intend it to be generic enough to support both TTML and WebVTT.

The exact Cue object (as output by the TTML parser) will be sent to the TextDisplayer plugin. If there is something missing in that object, we would be happy to take either a feature request or pull request to add fields to Cue and add parser support for them in the TTML parser.

palemieux commented 6 years ago

TTML supports capabilities beyond what shaka.text.Cue supports, including:

Couple of options:

imscJS can readily support the second and third options.

Thoughts?

nigelmegitt commented 6 years ago

Makes sense to allow the Cue object to support richer definition of payload content than it does now. Maybe even by allowing Cue.payload to be a Cue itself, or if you don't want to support infinite nesting, then by including some kind of markup within the payload that would override the appropriate properties like fontStyle, fontWeight, color etc. for a subsection of text within them.

Some approach like that will be required to support TTML properly since it allows style attributes to be specified on spans within what are considered "cues" here.

joeyparrish commented 6 years ago

First, let me address using imscJS for TTML support:

Our current TTML parser, limited though it may be, compiles to about 7,939 bytes. imscJS + its sax dependency is 159,787 bytes. That's a 20x increase in size for the TTML parser itself. Shaka Player as a whole is currently 185,141 bytes (in my working directory, anyway), so adding imscJS + sax would be an 86% increase in the size of Shaka Player itself.

That is way too big for a built-in subtitle parser. If you want to use imscJS to handle TTML, you can always have a text-parsing plugin at the application level which replaces our default TTML parser. See the API docs on TextParser if you want to pursue that: https://shaka-player-demo.appspot.com/docs/api/shakaExtern.TextParser.html

As for enriching our existing TTML parser, I'm completely open to that. But since it sounds like our Cue interface is not up to the task of full TTML support, it will need to be redesigned. If you want to contribute in this area, I would prefer to get a design proposal for Cue and iterate on that before you make a complete pull request. If you don't want to do this, my team will look into it once we have taken care of some higher-priority features. For now, this is still in the "backlog" milestone for us.

Thanks!

palemieux commented 6 years ago

That is way too big for a built-in subtitle parser. If you want to use imscJS to handle TTML, you can always have a text-parsing plugin at the application level which replaces our default TTML parser.

Seems a reasonable approach: have the application provide text parser and cue renderer implementations, e.g. using imscJS. Perhaps a simple example is all that is required.

palemieux commented 6 years ago

@joeyparrish P.S.: I very much appreciate your taking the time to explore these various options.

joeyparrish commented 2 years ago

This issue is quite old, and our TTML and rendering capabilities have grown a lot. If there are still gaps in our support of TTML parsing or rendering, please feel free to open new issues for them. Thanks!