Several issues with subtitles

TheOneric commented 2 years ago

There appear to be multiple issues with subtitle rendering in Bino. This was tested on Bino commit ddf9854f9dd776dd3d142870cf5e207cf29994b2 (tag: bino-1.6.8). Sample files, logs, debug patches and screenshots can be found in the attached archive: bino_i13_samples.zip. For a comparison to a correct reference there are also a few correct_* screenshots obtained from a recent mpv+libass, but they don't cover everything.

Most notably, some subtitle events go missing, i.e. they aren't rendered at all. So for the test_108p.{mkv,ass} sample Bino only displays this bino_1080p but it should actually render the following (for both sides, the anamorphic sample is similar see archive): correct_1080p__mpv

. And in the aegisub-formattest.{mkv,ass} sample several overlapping events go missing leaving only one; among other this happens e.g. with those two events:

Dialogue: 0,0:00:07.00,0:00:09.00,Default,,0000,0000,0000,,Now going to test conflict resolution. {By the way, this is a common abuse, "inline-comments" in override tag groups. They should not be rendered and not generate errors, just be i
gnored.}
Dialogue: 0,0:00:07.50,0:00:09.00,Default,,0000,0000,0000,,This line should be above the previous.

bino_aegi

At first I suspected the bitmaps were dropped somewhere during blending and afaict this could also happen if a zero-area bitmap is mixed into the results. At some places only max_x < 0 was checked; just in case I also added checks for max_y but I'm not sure if that can actually cause problems. Either way, I applied the following patch:

diff --git a/src/subtitle_renderer.cpp b/src/subtitle_renderer.cpp
index efc3cc5..05d5e38 100644
--- a/src/subtitle_renderer.cpp
+++ b/src/subtitle_renderer.cpp
@@ -459,8 +459,10 @@ bool subtitle_renderer::prerender_ass(const subtitle_box &box, int64_t timestamp
     int min_y = height;
     int max_y = -1;
     ASS_Image *img = _ass_img;
-    while (img && img->w > 0 && img->h > 0)
+    while (img)
     {
+        if(img->w == 0 && img->h == 0)
+            continue;
         if (img->dst_x < min_x)
             min_x = img->dst_x;
         if (img->dst_x + img->w - 1 > max_x)
@@ -471,7 +473,7 @@ bool subtitle_renderer::prerender_ass(const subtitle_box &box, int64_t timestamp
             max_y = img->dst_y + img->h - 1;
         img = img->next;
     }
-    if (max_x < 0)
+    if (max_x < 0 || max_y < 0)
     {
         _bb_x = 0;
         _bb_y = 0;
@@ -494,10 +496,12 @@ void subtitle_renderer::render_ass(uint32_t *bgra32_buffer)
     {
         std::memset(bgra32_buffer, 0, _bb_w * _bb_h * sizeof(uint32_t));
         ASS_Image *img = _ass_img;
-        while (img && img->w > 0 && img->h > 0)
+        while (img)
         {
-            blend_ass_image(img, bgra32_buffer);
+            if (img->w > 0 && img->h > 0)
+                blend_ass_image(img, bgra32_buffer);
             img = img->next;
+            printf("   ------ R_A: Blend Image %2d\n", i++);
         }
     }
 }
@@ -522,7 +526,7 @@ bool subtitle_renderer::prerender_img(const subtitle_box &box)
         if (img.y + img.h - 1 > max_y)
             max_y = img.y + img.h - 1;
     }
-    if (max_x < 0)
+    if (max_x < 0 || max_y < 0)
     {
         _bb_x = 0;
         _bb_y = 0;

But this didn't resolve the issue, and adding more debugging messages (see archive) revealed that the events weren't even showing up in the data fed to libass. Unfortunately I couldn't quite follow the function calls to determine where the file is read and was thus unable to look further into why this happens. In particular, the output from the aegisub-formattest.{mkv,ass} sample suggests Bino is continuously flushing or recreating the track and trying to only adding the currently active events; I suspect this is where things go wrong. Note that ASS events in a file are not required to be in chronological order. Unless expecting to work with very large subtitle streams, whose complete content does not fit into memory, it may be more robust to just feed the entire file to libass or in case of muxed packets, pass the packets to ass_process_chunk without flushing or recreating the track.

.

Furthermore, I also noticed Bino is not calling ass_set_storage_size, which means that even if the events are passed to libass the rendering may be incorrect, notably for the 1080p and anamorphic samples since 3D-transforms and other tags unfortunately depend on the videos storage resolution in ASS. To fix this, Bino should call ass_set_storage_size telling libass the video’s nominal size before anamorphic de-squeezing, i.e. the size the video is encoded in with codec-level crop applied. Just using the size the decoder reports will most-likely already match this; compare to the correct_* screenshots from the attached archive to empirically validate it matches.

.

As a final note, the ASS header used for converted subs could also be improved. Omitting the Style field entirely in the event’s Format will iinm result in “the default ASS fallback style” being used instead of the Default style specified in the [V4+ Styles] section; adding the Style field and setting it to Default for all events will resolve this. In general to get reasonable results across different video sizes, ScaledBorderAndShadow: yes\n should be added to the [Script Info] section (however due to using a custom Format line this is already the default). Specifying PlayResX and PlayResY such that they match the video’s aspect ratio might also be a good idea. For your information: using custom Format lines is a libass-specific extension which doesn't work in other ASS renderers — however since it’s only used for temporary internal conversions and not written into an actual file this shouldn't be a problem here.

marlam commented 2 years ago

Thank you for your detailed analysis, for your test samples, and for your very detailed report! It is much appreciated!

I agree that the problem is likely in the way packets are read via the FFmpeg libraries and then fed into libass.

Unfortunately there are two practical problems I face:

The whole FFmpeg related code should be rewritten (again) to match the current API of FFmpeg, see also https://github.com/marlam/bino-mirror/issues/12. It may be just me, but I find that it is entirely unclear how this should be done since libav* documentation, though improving, is still extremely poor. I have recently rewritten an unrelated piece of code that just decodes video frames, and it is suprisingly hard to even get all frames decoded regardless of format and codec, even before audio or subtitles are taken into account, and it takes hundreds of lines of code.
I was never a subtitle expert, and the ASS specification seems very vague and complex. To me it is unclear how to verify that things are working correctly except for inspecting a few test cases manually, which can never cover all the potential problems.

So while I agree that the subtitle problems should be fixed, I am not sure when I can find the time and energy to relearn all of libav* and all of libass - at least not right now.

TheOneric commented 2 years ago

I’m not familiar with ffmpeg’s library interface myself, so I cannot help much with the first issue, but I can offer some help regarding text subtitles and libass specifically.

the ASS specification seems very vague and complex. To me it is unclear how to verify that things are working correctly except for inspecting a few test cases manually

The "specification" isn't complete, fully correct or a real specification; Aegisub’s documentation is more up-to-date and probably more helpful to get an idea of how to use the format. Detailed knowledge of ASS isn't necessary to use libass to render subtitles though. ffmpeg can convert text-subtitles to ASS and if the format is ASS, libass does the bulk of the work and mostly just needs to be fed the necessary data.

Currently bino, targets a minimum libass version of 0.9.9 (March 2010), which e.g. necessitates the temporary fontconfig configuration file. Bumping the minimum version to 0.13.0 (October 2015), means there are CoreText (MacOS, iOS) and DirectWrite (Microsoft Platforms) fontproviders in addition to Fontconfig, so this logic could be dropped to simplify the process. As mentioned before Bino currently seems to recreate the track frequently (every frame or subtitle packet?), instead the track instance should be kept for the entire stream duration. For reference, here's how a simple libass usage creating Library, Renderer and Track as a triplet for a stream would look like; this should be appropriate for Bino. Other than the FONTPROVIDER_AUTODETECT part this also all works with libass 0.10.2 (October 2013) already.

bool on_stream_start(...)
{
    // Place thread locks as necessary, no libass objects are thread safe
    // Also applies to the other pseudo functions
    library = NULL;
    renderer = NULL;
    track = NULL;

    library = ass_library_init();
    if (!library)
        goto failure;
    ass_set_message_cb(library, libass_msg_callback, NULL);
    ass_set_extract_fonts(library, 1);

    renderer = ass_renderer_init(library);
    if (!renderer)
        goto failure;
    // Before being used the renderer must be configured
    // Only frame size and fonts are a technical hard requirement
    // but for correct/compatible results, I recommend the following (or more).
    // Configurations can also be updated later where it makes sense.
    ass_set_frame_size(renderer, display_width, display_height); //of one side
    // Unfortunately required due to how SSA and ASS historically developed
    // For native 3D sources, I guess the dimensions of just one view? Or only call if the source is 2D?
    ass_set_storage_size(renderer, encoded_video_width, encoded_video_height);
    // For libass >= 0.15.0 PAR will be calculated automatically from
    // storage size if set, so PAR technically doesn't need to be set if storage size was set
    ass_set_pixel_aspect(renderer, par);
    // AUTODETECT prefers the platforms native font providers (eg CoreText on MacOS)
    // if they weren't disabled at build time. No temp config file needed.
    ass_set_fonts(renderer, NULL, "sans-serif", ASS_FONTPROVIDER_AUTODETECT, NULL, 1);
    // Optional: further config like style overrides etc

    track = ass_new_track(library);
    if (!track)
        goto failure;

    // Alternatively: use ffmpeg to mux separate files into
    // the stream so this always takes the muxed path.
    if (subtitles_data_fully_available_already) {
        ass_process_data(lib, sub_data; data_size);
    } else { // Subs are muxed into (matroska) stream.
        // The ASS header is in Matroska's Codec Private section
        // and already available at the beginning
        ass_process_codec_private(track, buffer, size);
    }

    return true;

failure:
    if (track)    ass_free_track(track);
    if (renderer) ass_renderer_done(renderer);
    if (library)  ass_library_done(library);
    return false; // don't enter any of the other pseudo-functions
}

void new_subtitle_packet(long long start, long long duration, void data, size_t size)
{
    // All subtitle packets should have a starting time and duration
    // (milliseconds for libass) and the actual subtitle data chunk
    ass_process_chunk(track, data_buf, data_size, start_ms, duration_ms);
}

void render_subs(long long time_ms, ...)
{
    int change;
    ASS_Images* img = ass_render_frame(renderer, track, time_ms, &change);

    // If subtitles were blended into a distinct buffer and the previous
    // data is still available, it can be reused if there was no change
    if (!change) {
        use_previous_subbuffer();
        return;
    }

    clear_subbuffer();
    while(img) {
        // Currently Bino breaks on zero-area images; it should just continue instead
        if(img->w == 0 || img->h == 0)
            continue;
        // For full VSFilter compatibility the img->color value should also
        // be transformed before blending as described in ass_types.h
        // libass cannot do this because it depends on the video colorspace.
        blend_img_to_subbuffer(img);
        img = img->next;
    }
}

void on_stream_end(...)
{
    ass_free_track(track);
    ass_renderer_done(renderer);
    ass_library_done(library);
}

More info can be found in the documentation comments in libass headers. Since some documentation corrections and improvements were recently merged, it might be a good idea to check the descriptions on git master instead of the release at hand.

marlam commented 2 years ago

Bino is currently being rewritten completely and will be based on Qt 6, including replacing FFmpeg and libass with QtMultimedia. This means that ASS subtitles (and other types) will only be supported if they are supported within QtMultimedia. This may not yet be the case, but maybe the new QtMultimedia FFmpeg backend will solve this problem. An initial version of the new Bino is already in the main git repository, you can try it.

marlam / bino

Several issues with subtitles #13