Improve readability of karaoke style captions with UITextDisplayer

Have you read the FAQ and checked for duplicate open issues? Yes

Is your feature request related to a problem? Please describe.

When displaying karaoke style captions with UITextDisplayer shaka-player currently displays each cue separately and because of the flex based layout this means that every time a new cue appears it moves the existing ones around on the page, making it difficult to read them.

Describe the solution you'd like

Instead of rendering each cue separately UITextDisplayer could render the entire line and then use visibility: hidden on the cues that shouldn't be displayed yet and then to visibility: visible when they need to appear. That would avoid layout shifts making the captions easier to read and would likely lead to better performance as the browser wouldn't have to do layout calculations for cues on the same line and shaka-player would only have to create the elements for each line once and could then just toggle the visibility style property.

If the user seeks it should probably force a full re-render to avoid elements staying on the screen longer than they should.

Describe alternatives you've considered

Achieving the same effect by editing the subtitle files and adding the styling in there with extra invisible cues to occupy the space.

I haven't looked into the internals of UITextDisplayer too much yet, so if keeping track of cues already on screen and toggling CSS properties would require too big of a refactoring, the same effect could still be achieved with rerendering everything for each cue but still using visibility: hidden on hidden cues so that they still occupy the space.

Additional context

n/a

Are you planning send a PR to add it?

I would happy to write the code for it but I would need someone else to do the actual PR because of the CLA requirement.

I would happy to write the code for it but I would need someone else to do the actual PR because of the CLA requirement.

I am not a lawyer - but in a non-legal sense, I think that is a weird thing for you to do. The CLA means, in a nutshell, "I assign copyright of this code to the owner of the project". If you give it to someone else, and they sign it and claim the code as their own and then tell us we can have it... What's the actual difference to you? I mean, from a completely non-legalistic point of view, I really don't see the difference.

But, whatever, I'm not going to fight about this, I just genuinely don't get it. If you don't want to discuss it publicly, that's also fine. You can DM me on video-dev.org Slack if you want to discuss it privately. If you don't want to discuss it at all, of course, that's also fine. Just so confused over here.

Instead of rendering each cue separately UITextDisplayer could render the entire line and then use visibility: hidden on the cues that shouldn't be displayed yet and then to visibility: visible when they need to appear. That would avoid layout shifts making the captions easier to read and would likely lead to better performance as the browser wouldn't have to do layout calculations for cues on the same line and shaka-player would only have to create the elements for each line once and could then just toggle the visibility style property.

I don't know that UITextDisplayer is the right place to solve this. Its job is just to display what it is given. If you want it displayed differently, that would generally be handled when/where you author the subtitles. The major subtitle formats we think about for streaming video (VTT & TTML) have ample ways for these things to be styled to explicitly do what you're asking for.

Besides this, I note that you want a subtitle's position to be influenced by what is coming next.

Since you're talking about karaoke subs specifically, I think we could make a case that the sub parser is the best place to handle this. For example, I'm assuming you would use something like LRC format for this, so the LRC parser. If we know at the parser level that it's karaoke/lyrics, the parser could look ahead and style the parsed subs appropriately for any coming overlaps.

What do you think?

I am not a lawyer - but in a non-legal sense, I think that is a weird thing for you to do. The CLA means, in a nutshell, "I assign copyright of this code to the owner of the project". If you give it to someone else, and they sign it and claim the code as their own and then tell us we can have it... What's the actual difference to you? I mean, from a completely non-legalistic point of view, I really don't see the difference.

But, whatever, I'm not going to fight about this, I just genuinely don't get it. If you don't want to discuss it publicly, that's also fine. You can DM me on video-dev.org Slack if you want to discuss it privately. If you don't want to discuss it at all, of course, that's also fine. Just so confused over here.

I am completely fine with other people using my code, the problem is that instead of it being anonymous and implied by the project license, Google makes you sign a legal document with your legal details. To put it bluntly, regardless of the license of the project in this day and age the moment I push my code out into the internet, I already expect that individuals and companies will violate the copyleft or copyright of it. Also signing the CLA is also sort of moot considering that people have opened pull requests in the past, not signed the CLA and then shaka-player maintainers have just copied the code into a separate pull request and committed it under their own name anyway, to bypass the CLA requirement. Not trying to throw people under the bus, just pointing out that it is unfortunate that your employer has placed these bureaucracy hurdles in the way of contributing to the project that you created, making it easier to create a custom build than creating a pull request. I guess we just have to be grateful that they even let you open source it in the first place.

I don't know that UITextDisplayer is the right place to solve this. Its job is just to display what it is given. If you want it displayed differently, that would generally be handled when/where you author the subtitles. The major subtitle formats we think about for streaming video (VTT & TTML) have ample ways for these things to be styled to explicitly do what you're asking for.

Good point. It should be doable to output a much larger subtitle file with each line multiplied and then using styling instead of karaoke text to achieve the effect. Although the bandwidth use is probably negligible considering that it is literally streaming much larger video files at the same time.

For example, I'm assuming you would use something like LRC format for this, so the LRC parser.

We are using WebVTT with the karaoke timestamps for speaker text. FYI shaka-player's LRC parser only handles full lines, not the non-standard LRC extensions for karaoke style text, not that it is relevant to this discussion.

As you advised above it's probably better if we just modify the subtitle files before passing them to shaka-player, that way we get readable text that doesn't shift around on the screen, shaka-player won't require any code changes and I don't have to deal with the hurdles of contributing code to shaka-player. I'll close this feature request.

it is unfortunate that your employer has placed these bureaucracy hurdles in the way of contributing to the project

I completely agree.

shaka-project / shaka-player

Improve readability of karaoke style captions with UITextDisplayer #7610