mpv-player / mpv

🎥 Command line video player
https://mpv.io
Other
27.77k stars 2.86k forks source link

RTL display for subtitles #12978

Open ShlomoCode opened 9 months ago

ShlomoCode commented 9 months ago

Before requesting a new feature make sure it hasn't been requested yet. https://github.com/mpv-player/mpv/labels/meta%3Afeature-request

Expected behavior of the wanted feature

Option to display subtitles with RTL (right to left) correct. Either automatic (detection of the subtitle language) or a command line option. This is consumed for translated subtitles, that the translation always leaves certain words in English, and then those words escape to the right side of the display.

It was created here following this problem that I opened in IINA that uses mpv behind the scenes: https://github.com/iina/iina/issues/4698

Log file

I'm using iina which uses mpv behind the scenes so I have no idea how to get a log file.

llyyr commented 9 months ago
diff --git a/sub/sd_ass.c b/sub/sd_ass.c
index 6742f6f658..1b139b1a06 100644
--- a/sub/sd_ass.c
+++ b/sub/sd_ass.c
@@ -448,8 +448,10 @@ static void configure_ass(struct sd *sd, struct mp_osd_res *dim,
     ass_set_hinting(priv, set_hinting);
     ass_set_line_spacing(priv, set_line_spacing);
 #if LIBASS_VERSION >= 0x01600010
-    if (converted)
+    if (converted) {
         ass_track_set_feature(track, ASS_FEATURE_WRAP_UNICODE, 1);
+        ass_track_set_feature(track, ASS_FEATURE_WHOLE_TEXT_LAYOUT, 1);
+    }
 #endif
     if (converted) {
         bool override_playres = true;

Can you try this diff? Or alternatively provide sample subtitles, if it works then hooking it up to an option should be trivial

ShlomoCode commented 9 months ago

Or alternatively provide sample subtitles

video subtitles For example at position 0:20 in the video there is a sentence in Hebrew with one word in English

llyyr commented 9 months ago

subtitles

url doesn't work for me, can you just upload the file to github?

ShlomoCode commented 9 months ago

GitHub does not allow uploading .str files...

We don’t support that file type.

Try again with GIF, JPEG, JPG, MOV, MP4, PNG, SVG, WEBM, CPUPROFILE, CSV, DMP, DOCX, FODG, FODP, FODS, FODT, GZ, JSON, JSONC, LOG, MD, ODF, ODG, ODP, ODS, ODT, PATCH, PDF, PPTX, TGZ, TXT, XLS, XLSX or ZIP.

Here it is in a zip file: [Hebrew] TypeScript vs JavaScript _ Guido van Rossum and Lex Fridman [DownSub.com].srt.zip

llyyr commented 9 months ago

I couldn't reproduce the "english words escape to the right" issue. It renders how it is in the sub file

image

What version of iina are you using?

ShlomoCode commented 9 months ago

I couldn't reproduce the "english words escape to the right" issue. It renders how it is in the sub file

CleanShot 2023-11-27 at 13 54 12@2x

With correct RTL it should look like this: CleanShot 2023-11-27 at 13 58 07@2x "It renders how it is in the sub file" it again depends on how you opened the subtitle file - not every editor properly supports RTL. For example, VSCode is an example of software that supports RTL very poorly, and Microsoft Word (not a code editor but a text editor) supports it very well.

What version of iina are you using?

1.3.3 Build 138 mpv 0.35.0-419-gf79 FFmpeg 6.0

ShlomoCode commented 9 months ago

@llyyr Thanks for this quick fix! very appreciate :) I downloaded the artifact from the PR (the mpv-i686-w64-mingw32 file) and i confirm that the specific case I demonstrated (position 00:00:20) is displayed properly from right to left, but it seems that there are cases where the direction still gets confused, for example at position 00:00:34 in the video:

CleanShot 2023-11-27 at 16 01 46@2x

The word "JavaScript" should be on the right side (the beginning of the sentence in RTL), and "JavaScript EES" should be on the left side (the end of the sentence in RTL), like in Microsoft Word for example:

CleanShot 2023-11-27 at 16 07 21@2x

Or just html with direction: rtl css: CleanShot 2023-11-27 at 16 06 00@2x

Same thing for example at position 00:00:39: CleanShot 2023-11-27 at 16 06 50@2x

The word "Transpilers" is the beginning of the sentence and therefore should be displayed on the right side in a right-to-left language.

I checked and the problem also exists when using a single subtitle track.

This is the command I used:

./mpv.exe video.mp4 --sub-file=sub_he.srt --sub-file=sub_en.srt --sub-detect-rtl --sid=1 --secondary-sid=2

I am also attaching the LTR subtitles, in case you need them: subs.zip

ShlomoCode commented 9 months ago

Now I'm thinking, maybe the correction only helped in cases where the first letter in the current sentence is in Hebrew and the English word is in the center of the sentence, but if the first word in the sentence is in English, the entire sentence is defined as LTR?

avih commented 9 months ago

285902001-c2f20e0d-617a-445e-9c9e-121a5bb9e807

For reference, this subs file is auto translation of youtube to Hebrew.

Is there any player which shows this correctly? As far as I can tell, both MPC and VLC also show it broken like this. Even in Firefox at the youtube page it shows it the same (broken).

As far as I can tell it only shows correctly in chrome (based) browsers.

Maybe chrome does some magic where it knows to show it RTL primarily, which bypass the brokenness of the subs?

llyyr commented 9 months ago

Maybe chrome does some magic where it knows to show it RTL primarily, which bypass the brokenness of the subs?

Kodi displays it correctly but violates specifications while doing so https://github.com/xbmc/xbmc/pull/22663/commits/d8073163549f383dee856f312a02d5e80c769a04

edit:

The original webvtt generated by youtube itself doesn't contain any RTL markings, so it's not really possible to auto-detect this information. I'd consider these subtitles to be broken, but there's still some merit for libass to provide some API for forcing RTL rendering

avih commented 9 months ago

Kodi displays it correctly but violates specifications while doing so xbmc/xbmc@d807316

As far as I can tell, this patch replaces LTR/RTL marks encoded as HTML literals ‎, ‏ (should translate to U+200E, U+200F, respectively) with the equivalent embedded marks (U+202A, U+202B), because, according to the patch, libass only interprets the latter?

However, neither the SRT nor the original VTT from which it was converted have any RTL marks (HTML/direct/embedded), so basically whoever renders it simply can't know that it's primarily RTL.

TL;DR: these subs are broken.

The reason chrome renders it correctly is because it has a CSS tag which overrides to RTL, but that info is not conveyed at the vtt/srt files.

libass (via fribidi) can guess most of the lines correct, according to the first word of the line, but this would only work for lines which begin in Hebrew (and this vtt/srt also has lines which should be RTL but begin in English - so that would be broken anyway with auto detection).

However 2, libass doesn't enable autodetection in fribidi by default, for compatibility with vsfilter.

To enable autodetection (which would still be broken with lines which begin in English), we'd need #12985 .

To allow the user to force RTL primarily for all the lines (in fribidi, instead of autodetect), libass will need to add some support which currently doesn't exist.

However, the first and bottom lines are that the subs are broken. The RTL info is conveyed at a side channel of the browser, and it's not part of the subs themselves.

Any auto detection, or force options, would be ugly workarounds for broken subs.

ShlomoCode commented 9 months ago

I think automatic detection based on paragraph initiation is common behavior. This is actually the default for browsers, meaning that when RTL/LTR is not explicitly set it will be auto https://developer.mozilla.org/en-US/docs/Web/HTML/Global_attributes#dir

0xifarouk commented 5 months ago

Any updates on this issue please?