Closed friday closed 1 year ago
In addition to that, all the cues are actually displayed as white, rather than their real colors, but this is already covered by https://github.com/shaka-project/shaka-player/issues/4479
I just double-checked in Chrome 106 on Linux. If you pass text like <c.lime>lime</c>
through new VTTCue
and TextTrack's addCue()
method, you get plain white text that says lime
. This isn't a Shaka Player limitation, but a limitation in Chrome's implementation of VTTCue
. What you're seeing is Shaka Player using the browser's native capabilities to show you the subtitles.
If you use Shaka's UI, or enable the UITextDisplayer
plugin instead of the default SimpleTextDisplayer
, you should get colors, layout, and other things Chrome can't do natively. UITextDisplayer
uses CSS & HTML to render subtitles more faithfully than a browser can natively. To enable that plugin without using the full UI, the video element will need to be inside a div that is the same size. We call this a "video container". Then you tell the player about that container:
player.setVideoContainer(containerDiv);
The video container must be marked by the application as position: relative
. Shaka Player will create another div on top of the video element (within the video container) with class shaka-text-container
. The minimum styles for that are:
.shaka-text-container {
position: absolute;
left: 0;
right: 0;
top: 0;
bottom: 0;
pointer-events: none;
font-size: 20px;
line-height: 1.4;
color: white;
}
Your application must provide those. (If you used our full UI, that would be included in the full UI stylesheet.) Other styles will be set as needed inline on the elements within shaka-text-container, to manage the styling of individual cues.
I'm embarrassed not to find any of this in our docs. I'll correct that soon.
Does this help?
Thank you for your fast and detailed response :pray:
I just double-checked in Chrome 106 on Linux. If you pass text like
<c.lime>lime</c>
throughnew VTTCue
and TextTrack'saddCue()
method, you get plain white text that sayslime
. This isn't a Shaka Player limitation, but a limitation in Chrome's implementation ofVTTCue
. What you're seeing is Shaka Player using the browser's native capabilities to show you the subtitles.
I see. I can accept that for sure. Maybe I should have clarified that that was just a side point though. I'm not using either the native browser subtitle renderer or Shakas UITextDisplayer (I was confused about how to do that for testing purposes though, so thanks for elaborating on this and improving the documentation).
The company I work for have to support multiple players, so we can't use the players or browsers custom subtitle renderers or the browser renderers. Our solution is to hide the native subtitles but still use the html 5 video's textTracks and cuechange
event to get the cues and render our own element for the cue positioned over the video. It's probably similar to how UITextDisplayer work, but I didn't look into that. With dash.js and hls.js for example they just feed the text into the cue, so it works for us. Shaka did too before 3.1.0. But after 3.1.0 Shaka removes the color information completely from the cue.
You can do this in any modern browser: new VTTCue(0, 1, "<c.lime>lime</c>").getCueAsHTML().firstChild
and it will output this: <span class="lime">lime</span>
But as of f42ccd2867f341e89cdf27316fd9bb7e63d22772 Shaka tries to parse the cue text payload and split it up into multiple cues etc via the parseCueStyles()
. In that PR you only handled valid xml, so efficiently this caused no change for us. But as of d882d28 you also convert the cue to valid xml and then set payload to only "lime"
. Not "<c.lime>lime</c>"
.
We can override it by overriding either parseCueStyles
or replaceColorPayload_
. Ex:
Object.assign(shaka.text.VttTextParser, {
parseCueStyles(payload, rootCue) {
rootCue.payload = payload;
},
});
Then it works again for our purposes, but breaks internal Shaka logic for UITextDisplayer
. And shaka.text.VttTextParserparseCueStyles is mangled in the regular build, so this only work with the debug build.
A more proper solution (that I ended up using) is to create a custom "text displayer" similar to SimpleTextDisplayer and solve it there.
My suggestion would be to move all this reformatting logic from these two PRs from (the actual issue with this is probably assuming the input format was WebVTT)VttTextParser
to UITextDisplayer
. I also think this could be simplified considerably if you could use the native browser function VTTCue.getCueAsHTML()
instead doing this yourselves, but I'm sure there were reasons I missed (maybe compatibility with very old browsers).
In addition to that, all the cues are actually displayed as white, rather than their real colors, but this is already covered by #4479
I just double-checked in Chrome 106 on Linux. If you pass text like
<c.lime>lime</c>
throughnew VTTCue
and TextTrack'saddCue()
method, you get plain white text that sayslime
. This isn't a Shaka Player limitation, but a limitation in Chrome's implementation ofVTTCue
. What you're seeing is Shaka Player using the browser's native capabilities to show you the subtitles.If you use Shaka's UI, or enable the
UITextDisplayer
plugin instead of the defaultSimpleTextDisplayer
, you should get colors, layout, and other things Chrome can't do natively.UITextDisplayer
uses CSS & HTML to render subtitles more faithfully than a browser can natively. To enable that plugin without using the full UI, the video element will need to be inside a div that is the same size. We call this a "video container". Then you tell the player about that container:player.setVideoContainer(containerDiv);
The video container must be marked by the application as
position: relative
. Shaka Player will create another div on top of the video element (within the video container) with classshaka-text-container
. The minimum styles for that are:.shaka-text-container { position: absolute; left: 0; right: 0; top: 0; bottom: 0; pointer-events: none; font-size: 20px; line-height: 1.4; color: white; }
Your application must provide those. (If you used our full UI, that would be included in the full UI stylesheet.) Other styles will be set as needed inline on the elements within shaka-text-container, to manage the styling of individual cues.
I'm embarrassed not to find any of this in our docs. I'll correct that soon.
Does this help?
I just spent the whole day debugging a similar issue (shaka-player without the UI rendering TTML cues without colors), just to find out this comment ! 😌 So yes, an update of the docs would be gladly welcomed ;)
Also, when I have a look at ui/less/containers.less
, line 315, I see that there is also this style :
.shaka-text-container {
// ...
span.shaka-text-wrapper { // <<===
display: inline;
background: none;
}
}
Do we also need this one ?
Thanks !
This is clearly a pretty complex workflow that has some room for improvements.
Shaka supports parsing VTT, TTML as well as several other (mostly simpler) formats. These are converted to Shaka's custom cue format, and if using SimpleTextDisplayer or WebVttGenerator they are converted (back) to VTTCues.
The VTT parser removes the original color classes, and neither of those classes adds them back when it converts back to VTT.
My preferred solution at the moment would be to
cue.convertToVTTCue()
to avoid the code duplication in SimpleTextDisplayer or WebVttGenerator, so they could call it instead, and I think it would be easier to test. One problem with this is that TTML can have weird nested structures and one cue might actually contain two cues that don't even have the same times. So either that has to be flattened, or the method has to return an array of VTTCues.shaka.text.Cue.defaultTextColor
but the keys and values inverted. This is what I'm doing now, but I don't think it's optimal.This of course would still lose TTML colors unless they were the exact values as in shaka.text.Cue.defaultTextColor
. One way to support them would be to extend VTTCues and/or add non-standard properties for them, and override getCueAsHTML()
so that it adds css properties for the colors, and that's probably taking it too far. Another way would be to add an API for retrieving the shaka cue or just the colors, using the vtt cue.
I would love to be able to add some contributions, but the more I looked into this the more complex the picture became.
I'm very much open to discussing a redesign of the cue system. I think what you are pointing out is only one of several complexities and drawbacks of the system we have today. I'm not certain, though, that adding on additional methods to Cue or adding the original payload to Cue is the best solution. It might make some operations more efficient at the expense of increasing overall complexity of Cue somewhat.
@friday, if you want to spend some time thinking about the cue system, you could write a proposal for what cues and text parsing could look like in the future (e.g., with no limits on breaking changes in Shaka Player v5). Would you be interested in designing something like that?
@friday, if you want to spend some time thinking about the cue system, you could write a proposal for what cues and text parsing could look like in the future (e.g., with no limits on breaking changes in Shaka Player v5). Would you be interested in designing something like that?
I don't think I'm qualified to write a proposal, since I haven't spent as much time with the Shaka source code as you, and hence wouldn't know all the complexities and drawbacks of the current system. Just the ones I bumped into. But I have some thoughts on the parts I'd like to improve. From my perspective we may not need a breaking/major version bump.
Do we really need cue.isContainer
? I believe this is only used for the TTML parser, and it's being flattened in the text displayers for the display output anyway. So it seems easier to me to flatten the cues in the TTML parser directly. This way every Shaka cue would represent an actual cue with a startTime and endTime. And for compatibility with custom text displayers we could keep it (always false
) and deprecate it until the next major version release.
The code duplication in SimpleTextDisplayer and WebVttGenerator, could be handled without the method I guess. I wanted it to be public for people like me implementing their own text displayers, but if we fix the problems so people don't have to do that, then we could move the code to an internal utility function instead. (avelad fixed this now in #4941 ✔️)
VttTextParser has two methods replaceColorPayload_
and replaceKaraokeStylePayload_
that converts the VTTCue format to valid XML compatible with shaka.util.XmlUtils.parseXmlString()
. This is a wrapper around new DOMParser().parseFromString()
that returns a DOM tree. This DOM tree is then traversed/processed by generateCueFromElement_()
. Would it not be possible to replace all of these by using VTTCue.getCueAsHTML()
? It also returns a DOM tree that can be traversed, and generateCueFromElement_()
could be adapted/rewritten to handle that? Maybe this is a browser compatibility (MS Edge?) issue? Although in that case maybe it should be polyfilled instead? Or we just don't support colors and karaoke timestamps on Edge (as long as we support it on physical devices with old Safari/Chrome).
Regarding the VTT colors, the VTTCue format is limited to web safe colors that have names (the spec is even more limited, but it's technically possible to use any class name, so we can use for example use "aquamarine" (#66ddaa
). As long as VTT is both the input and output and there is no custom cue styling that's fine. But if the input is TTML, then the color can be anything (color name, hex, rgb or rgba with alpha channel). So this problem has no good solutions. I would propose some smaller changes instead to improve the situation instead:
shaka.text.Cue.defaultTextColor.yellow
and shaka.text.Cue.defaultTextBackgroundColor.bg_yellow
would both be yellow
, not #FF0
(same for cue.color). By using this we don't have to convert hex colors back to class names for the simpletextdisplayer.::cue(.yellow) {color:gold}
) we can use color names as classes when possible, and use non-standard classes ex hex_color_77CCAAFF
and hex_color_bg_77CCAAFF
(where FF is the alpha channel). These would not work in a browser, but at least someone listening to cuechange
would be able to get the color codes from these classes without having to make their own text displayer. That would be good enough for me at least.If I take on these tasks and make PRs, would you prefer multiple small PRs? Personally I prefer multiple smaller ones.
- Do we really need
cue.isContainer
? I believe this is only used for the TTML parser, and it's being flattened in the text displayers for the display output anyway. So it seems easier to me to flatten the cues in the TTML parser directly. This way every Shaka cue would represent an actual cue with a startTime and endTime. And for compatibility with custom text displayers we could keep it (alwaysfalse
) and deprecate it until the next major version release.
/**
* If true, this represents a container element that is "above" the main
* cues. For example, the <body> and <div> tags that contain the <p> tags
* in a TTML file. This controls the flow of the final cues; any nested cues
* within an "isContainer" cue will be laid out as separate lines.
* @type {boolean}
* @exportDoc
*/
this.isContainer;
It impacts the layout (display styles) of elements in lib/text/ui_text_displayer.js.
If we're redesigning the system, anything could be replaced with some other way to get the same functionality. But this is what it's doing today.
- The code duplication in SimpleTextDisplayer and WebVttGenerator, could be handled without the method I guess. I wanted it to be public for people like me implementing their own text displayers, but if we fix the problems so people don't have to do that, then we could move the code to an internal utility function instead.
You're referring to flattenPayload? That could definitely be factored out. Whether or not we export it as part of the API contract of the library is another question. I'm not against it, though. We have lots of utilities exported because of their usefulness in writing network filters, manifest parsers, etc.
- VttTextParser has two methods
replaceColorPayload_
andreplaceKaraokeStylePayload_
that converts the VTTCue format to valid XML compatible withshaka.util.XmlUtils.parseXmlString()
. This is a wrapper aroundnew DOMParser().parseFromString()
that returns a DOM tree. This DOM tree is then traversed/processed bygenerateCueFromElement_()
. Would it not be possible to replace all of these by usingVTTCue.getCueAsHTML()
? It also returns a DOM tree that can be traversed, andgenerateCueFromElement_()
could be adapted/rewritten to handle that? Maybe this is a browser compatibility (MS Edge?) issue? Although in that case maybe it should be polyfilled instead? Or we just don't support colors and karaoke timestamps on Edge (as long as we support it on physical devices with old Safari/Chrome).
I'm not so familiar with getCueAsHTML(), so I don't know the answer. We would have to check that it works on all platforms (including Xbox, Tizen, etc). Our test automation lab is pretty good for things like this, though. You could write a test in Jasmine (test/ folder) that checks the functionality, and send it in a standalone PR. I could then trigger the lab to test your PR and we would find out if it is usable or not.
Regarding the VTT colors, the VTTCue format is limited to web safe colors that have names (the spec is even more limited, but it's technically possible to use any class name, so we can use for example use "aquamarine" (
#66ddaa
). As long as VTT is both the input and output and there is no custom cue styling that's fine. But if the input is TTML, then the color can be anything (color name, hex, rgb or rgba with alpha channel). So this problem has no good solutions. I would propose some smaller changes instead to improve the situation instead:
- Use color names as values for the default color named (defined here). So
shaka.text.Cue.defaultTextColor.yellow
andshaka.text.Cue.defaultTextBackgroundColor.bg_yellow
would both beyellow
, not#FF0
(same for cue.color). By using this we don't have to convert hex colors back to class names for the simpletextdisplayer.- For other colors (if the input was TTML or using custom styling like
::cue(.yellow) {color:gold}
) we can use color names as classes when possible, and use non-standard classes exhex_color_77CCAAFF
andhex_color_bg_77CCAAFF
(where FF is the alpha channel). These would not work in a browser, but at least someone listening tocuechange
would be able to get the color codes from these classes without having to make their own text displayer. That would be good enough for me at least.
That all sounds reasonable.
If I take on these tasks and make PRs, would you prefer multiple small PRs? Personally I prefer multiple smaller ones.
Yeah, smaller PRs are easier to review and we can make incremental progress.
Can you test with main branch? Thanks!
Can you test with main branch? Thanks!
It's not fixed in main (v4.3.4-main-2-g7439a264).
The issue as I defined it is that the vtt parser stores colors in a way that the default (simple) text displayer can't access/understand, and doesn't even try.
In effect, the parser takes the VTT input payload <c.red>Text is red</c>
and the output is just Text is red
, when it "should" be the same as the input for the conversion flow not to be lossy.
So if someone tries to listen to TextTrack changes (the standard HTML5 Video way) and implement their own rendering, then the color information is lost, although as confirmed it works with UITextDisplayer (which doesn't create TextTracks).
Have you read the FAQ and checked for duplicate open issues? Yes
What version of Shaka Player are you using? 4.2.1
Can you reproduce the issue with our latest release version? Yes
Can you reproduce the issue with the latest code from
main
? YesAre you using the demo app or your own custom app? Custom app
If custom app, can you reproduce the issue using our demo app? No. It doesn't load the subtitle at all in your demo (because
language
is "" I think)What browser and OS are you using? Chromium and Firefox. Happens on all OSes
For embedded devices (smart TVs, etc.), what model and firmware version are you using? Doesn't apply
What are the manifest and license server URIs? Doesn't apply. See code below
What configuration are you using? What is the output of
player.getConfiguration()
?What did you do? Load a manifest with colored subtitles and add a
cuechange
listener logging the active cue and itsgetCueAsHTML()
result.Then play the video and watch the debug log. When the subtitle says the name of a color ex "cyan", "lime", "red", these should have matching classes. This is needed to implement custom subtitle cue display logic.
What did you expect to happen? The cue at 0:16 for example should have this text:
<c.lime>lime</c>
(or<c.lime>lime</c.lime>
which seems to be valid too and works with your xml parser) and thegetCueAsHTML()
value should be<span class="lime">lime</span>
. This is how it works in dash.js, hls.js and how it worked in Shaka before d882d28 (3.1.0).What actually happened? The cue at 0:16 just has the text
lime
and thegetCueAsHTML()
value is also justlime
.In addition to that, all the cues are actually displayed as white, rather than their real colors, but this is already covered by #4479