muzuiget / dualsub-support

Dualsub - Dual Subtitles for YouTube
https://www.dualsub.xyz/
281 stars 24 forks source link

Fixing the punctuation marks direction in the RTL languages #337

Closed Essam23 closed 3 years ago

Essam23 commented 3 years ago

Hi, in Netflix the punctuation marks in the RTL languages (Arabic) are not displayed in the correct direction, it looks like this

11

when it should look like this

22

Fixing this problem could be done by adding

"‏":"‫"

there is a hidden Unicode control character between "" (202B RIGHT-TO-LEFT EMBEDDING)

https://unicode.scarfboy.com/?s=%22%26rlm%3B%22%3A%22%E2%80%AB%22

the same hidden Unicode control character that is used in Subtitle Edit to fix the punctuation marks direction in the RTL languages.

333

I tested it by myself and it worked just fine.

muzuiget commented 3 years ago

I can't copy and parse your code to my code editor.

It better to use \uXXXX to escape the Unicode character, so can this work?

'‎': '\u202b',

Here is I test in console:

Screenshot_20210908_183519

muzuiget commented 3 years ago

Please also read this issue https://github.com/muzuiget/dualsub-support/issues/129

Essam23 commented 3 years ago

Thank you, this worked.

'‏': '\u202b',

for more information about this problem see here.

https://github.com/rsimmons/subadub/commit/c585fda90c8c5cb742f4fcaac693bd066f25b55a

muzuiget commented 3 years ago

After reading the wiki https://en.wikipedia.org/wiki/Right-to-left_mark, I think we should use \u200f. Current code is just remove the ‏.

So the change should be:

-    '‎': '',
+    '‎': '\u200e',
+    '‏': '\u200f',
Essam23 commented 3 years ago

This is a subtitle file you could use for testing.

Spider-Man.2.2004.zip

Essam23 commented 3 years ago

This is how inputstream.adaptive fixed this problem

https://github.com/xbmc/inputstream.adaptive/commit/0c3f167259454d628ca883358ef359618d770909

muzuiget commented 3 years ago

It is difficult for me to recognize Arabic characters.

Essam23 commented 3 years ago

The full stop must be to the left, that is all what you need.

muzuiget commented 3 years ago

The XBMC code show that we should use \u200e and \u200f. as the wiki says:

In Unicode, the RLM character is encoded at U+200F RIGHT-TO-LEFT MARK (HTML ‏ · ‏). In UTF-8 it is E2 80 8F. Usage is prescribed in the Unicode Bidi (bidirectional) Algorithm.[1]

Essam23 commented 3 years ago

It did not work, I tried it now.

33

l={"&amp;":"&","&lt;":"<","&gt;":">","&quot;":'"',"&#39;":"'","&lrm;":"","&rlm;":"\u200f"}

muzuiget commented 3 years ago

As the issue https://github.com/muzuiget/dualsub-support/issues/129 says, you should add the the CSS direction: rtl; to the HTML node.

Essam23 commented 3 years ago

I know this problem very well, and the fix that is done in Subadub and inputstream.adaptive was my idea, I deal with the subtitles a lot, please take my word in this problem.

Essam23 commented 3 years ago

This was the fix that is done by the developer of inputstream.adaptive and it did not work

https://github.com/xbmc/inputstream.adaptive/commit/d713b9557f5d0088a17ff0bca7092f7291445430

this is the one that worked

https://github.com/xbmc/inputstream.adaptive/commit/126101883b4541ebca93925d803d8de03b69adba

Until Netflix made some changes in the subtitles and we had to find a new fix.

muzuiget commented 3 years ago

Note: https://github.com/muzuiget/dualsub-support/issues/337#issuecomment-915121227 I have a typo, it should be '&rlm;': '\u202b',.

muzuiget commented 3 years ago

Alright, replace &rlm; to \u202b.

How about '&lrm;? replace it with empty string or \u202a?

Essam23 commented 3 years ago

For me I can not see the difference.

muzuiget commented 3 years ago

For consistency, I choose \u202a.

-    '&lrm;': '',
+    '&lrm;': '\u202a',
+    '&rlm;': '\u202b',
muzuiget commented 3 years ago

v1.68.0 released.

Essam23 commented 3 years ago

Thank you so much, It works perfectly.