Closed TheOneric closed 2 years ago
Thank you for your detailed analysis, for your test samples, and for your very detailed report! It is much appreciated!
I agree that the problem is likely in the way packets are read via the FFmpeg libraries and then fed into libass.
Unfortunately there are two practical problems I face:
So while I agree that the subtitle problems should be fixed, I am not sure when I can find the time and energy to relearn all of libav* and all of libass - at least not right now.
I’m not familiar with ffmpeg’s library interface myself, so I cannot help much with the first issue, but I can offer some help regarding text subtitles and libass specifically.
the ASS specification seems very vague and complex. To me it is unclear how to verify that things are working correctly except for inspecting a few test cases manually
The "specification" isn't complete, fully correct or a real specification; Aegisub’s documentation is more up-to-date and probably more helpful to get an idea of how to use the format. Detailed knowledge of ASS isn't necessary to use libass to render subtitles though. ffmpeg can convert text-subtitles to ASS and if the format is ASS, libass does the bulk of the work and mostly just needs to be fed the necessary data.
Currently bino, targets a minimum libass version of 0.9.9 (March 2010), which e.g. necessitates the temporary fontconfig configuration file. Bumping the minimum version to 0.13.0 (October 2015), means there are CoreText (MacOS, iOS) and DirectWrite (Microsoft Platforms) fontproviders in addition to Fontconfig, so this logic could be dropped to simplify the process.
As mentioned before Bino currently seems to recreate the track frequently (every frame or subtitle packet?), instead the track instance should be kept for the entire stream duration. For reference, here's how a simple libass usage creating Library, Renderer and Track as a triplet for a stream would look like; this should be appropriate for Bino.
Other than the FONTPROVIDER_AUTODETECT
part this also all works with libass 0.10.2 (October 2013) already.
bool on_stream_start(...)
{
// Place thread locks as necessary, no libass objects are thread safe
// Also applies to the other pseudo functions
library = NULL;
renderer = NULL;
track = NULL;
library = ass_library_init();
if (!library)
goto failure;
ass_set_message_cb(library, libass_msg_callback, NULL);
ass_set_extract_fonts(library, 1);
renderer = ass_renderer_init(library);
if (!renderer)
goto failure;
// Before being used the renderer must be configured
// Only frame size and fonts are a technical hard requirement
// but for correct/compatible results, I recommend the following (or more).
// Configurations can also be updated later where it makes sense.
ass_set_frame_size(renderer, display_width, display_height); //of one side
// Unfortunately required due to how SSA and ASS historically developed
// For native 3D sources, I guess the dimensions of just one view? Or only call if the source is 2D?
ass_set_storage_size(renderer, encoded_video_width, encoded_video_height);
// For libass >= 0.15.0 PAR will be calculated automatically from
// storage size if set, so PAR technically doesn't need to be set if storage size was set
ass_set_pixel_aspect(renderer, par);
// AUTODETECT prefers the platforms native font providers (eg CoreText on MacOS)
// if they weren't disabled at build time. No temp config file needed.
ass_set_fonts(renderer, NULL, "sans-serif", ASS_FONTPROVIDER_AUTODETECT, NULL, 1);
// Optional: further config like style overrides etc
track = ass_new_track(library);
if (!track)
goto failure;
// Alternatively: use ffmpeg to mux separate files into
// the stream so this always takes the muxed path.
if (subtitles_data_fully_available_already) {
ass_process_data(lib, sub_data; data_size);
} else { // Subs are muxed into (matroska) stream.
// The ASS header is in Matroska's Codec Private section
// and already available at the beginning
ass_process_codec_private(track, buffer, size);
}
return true;
failure:
if (track) ass_free_track(track);
if (renderer) ass_renderer_done(renderer);
if (library) ass_library_done(library);
return false; // don't enter any of the other pseudo-functions
}
void new_subtitle_packet(long long start, long long duration, void data, size_t size)
{
// All subtitle packets should have a starting time and duration
// (milliseconds for libass) and the actual subtitle data chunk
ass_process_chunk(track, data_buf, data_size, start_ms, duration_ms);
}
void render_subs(long long time_ms, ...)
{
int change;
ASS_Images* img = ass_render_frame(renderer, track, time_ms, &change);
// If subtitles were blended into a distinct buffer and the previous
// data is still available, it can be reused if there was no change
if (!change) {
use_previous_subbuffer();
return;
}
clear_subbuffer();
while(img) {
// Currently Bino breaks on zero-area images; it should just continue instead
if(img->w == 0 || img->h == 0)
continue;
// For full VSFilter compatibility the img->color value should also
// be transformed before blending as described in ass_types.h
// libass cannot do this because it depends on the video colorspace.
blend_img_to_subbuffer(img);
img = img->next;
}
}
void on_stream_end(...)
{
ass_free_track(track);
ass_renderer_done(renderer);
ass_library_done(library);
}
More info can be found in the documentation comments in libass headers. Since some documentation corrections and improvements were recently merged, it might be a good idea to check the descriptions on git master instead of the release at hand.
Bino is currently being rewritten completely and will be based on Qt 6, including replacing FFmpeg and libass with QtMultimedia. This means that ASS subtitles (and other types) will only be supported if they are supported within QtMultimedia. This may not yet be the case, but maybe the new QtMultimedia FFmpeg backend will solve this problem. An initial version of the new Bino is already in the main git repository, you can try it.
There appear to be multiple issues with subtitle rendering in Bino. This was tested on Bino commit ddf9854f9dd776dd3d142870cf5e207cf29994b2 (tag: bino-1.6.8). Sample files, logs, debug patches and screenshots can be found in the attached archive: bino_i13_samples.zip. For a comparison to a correct reference there are also a few
correct_*
screenshots obtained from a recent mpv+libass, but they don't cover everything.Most notably, some subtitle events go missing, i.e. they aren't rendered at all. So for the
test_108p.{mkv,ass}
sample Bino only displays this but it should actually render the following (for both sides, the anamorphic sample is similar see archive):. And in the
aegisub-formattest.{mkv,ass}
sample several overlapping events go missing leaving only one; among other this happens e.g. with those two events:At first I suspected the bitmaps were dropped somewhere during blending and afaict this could also happen if a zero-area bitmap is mixed into the results. At some places only
max_x < 0
was checked; just in case I also added checks formax_y
but I'm not sure if that can actually cause problems. Either way, I applied the following patch:But this didn't resolve the issue, and adding more debugging messages (see archive) revealed that the events weren't even showing up in the data fed to libass. Unfortunately I couldn't quite follow the function calls to determine where the file is read and was thus unable to look further into why this happens. In particular, the output from the
aegisub-formattest.{mkv,ass}
sample suggests Bino is continuously flushing or recreating the track and trying to only adding the currently active events; I suspect this is where things go wrong. Note that ASS events in a file are not required to be in chronological order. Unless expecting to work with very large subtitle streams, whose complete content does not fit into memory, it may be more robust to just feed the entire file to libass or in case of muxed packets, pass the packets toass_process_chunk
without flushing or recreating the track..
Furthermore, I also noticed Bino is not calling
ass_set_storage_size
, which means that even if the events are passed to libass the rendering may be incorrect, notably for the1080p
andanamorphic
samples since 3D-transforms and other tags unfortunately depend on the videos storage resolution in ASS. To fix this, Bino should callass_set_storage_size
telling libass the video’s nominal size before anamorphic de-squeezing, i.e. the size the video is encoded in with codec-level crop applied. Just using the size the decoder reports will most-likely already match this; compare to thecorrect_*
screenshots from the attached archive to empirically validate it matches..
As a final note, the ASS header used for converted subs could also be improved. Omitting the
Style
field entirely in the event’sFormat
will iinm result in “the default ASS fallback style” being used instead of theDefault
style specified in the[V4+ Styles]
section; adding theStyle
field and setting it toDefault
for all events will resolve this. In general to get reasonable results across different video sizes,ScaledBorderAndShadow: yes\n
should be added to the[Script Info]
section (however due to using a custom Format line this is already the default). SpecifyingPlayResX
andPlayResY
such that they match the video’s aspect ratio might also be a good idea. For your information: using custom Format lines is a libass-specific extension which doesn't work in other ASS renderers — however since it’s only used for temporary internal conversions and not written into an actual file this shouldn't be a problem here.