shaka-project / shaka-player

JavaScript player library / DASH & HLS client / MSE-EME player
Apache License 2.0
7.24k stars 1.34k forks source link

TTML Chinese Subtitle is not displayed for nestedCues #2478

Closed kumarashu123 closed 4 years ago

kumarashu123 commented 4 years ago

Have you read the FAQ and checked for duplicate open issues? Yes

What version of Shaka Player are you using? 2.5.9

Can you reproduce the issue with our latest release version? Yes

Can you reproduce the issue with the latest code from master? Yes

Are you using the demo app or your own custom app? own custom app

If custom app, can you reproduce the issue using our demo app? Yes, able to reproduce issue with demo app

What browser and OS are you using? Chrome and Mac

For embedded devices (smart TVs, etc.), what model and firmware version are you using?

What are the manifest and license server URIs?

What did you do?

play content which has Chinese, Arabic ttml subtitle

What did you expect to happen? Chinese, Arabic ttml subtitle should be displayed

What actually happened?

Chinese, Arabic TTML subtitle is not displaying for nestedCues

After Investigation I found that In ttml_textparser.js static parseCue function
if ( Array.from(cueElement.childNodes).find( (childNode) => childNode.nodeType === Node.TEXT_NODE && /\w+/.test(childNode.textContent) ) ) {

Here /\w+/.test(childNode.textContent)

I will appreciate if you look into the issue. Thanks in advance

joeyparrish commented 4 years ago

Thank you for the report! It seems that the solution is a new Regexp feature for unicode classes like \p{L}, but those are not supported in Firefox yet. We will research our options and decide what we can do to fix this.

joeyparrish commented 4 years ago

Actually, after some testing, it's not available in Chrome stable, either.

joeyparrish commented 4 years ago

I had the syntax wrong. It does work in Chrome.

'ü'.match(/\p{L}/u)
joeyparrish commented 4 years ago

Replacing \p{L} with an equivalent character class using unicode escape sequences is just over 4kB of JavaScript, in one giant constant that nobody could review, update, or fix.

So I think it would be better to rewrite the regex to avoid the need to detect word characters at all. It's not clear from the code why \w+ is used specifically.

joeyparrish commented 4 years ago

This should be fixed in the master branch and on the nightly build of our demo. We will also attempt to cherry-pick the fix in our next bugfix release. Please let us know if things are working for you now.