Open TieBaMma opened 1 year ago
Can be reproduced, it seems that the file decoding failed. The UTF-8 encoding with a BOM will also fail.
Update: When I try to load the subtitle using ffplay, it also fails. so it's most likely an ffmpeg issue and should be reported there.
If use --demuxer-lavf-format=ass
to forcing the ffmpeg ass subtitle demuxer, mpv will successfully open this subtitle.
The issue is that FFmpeg cannot detect the subtitle format correctly.
This appears to be an issue with FFmpeg’s ass_probe, which rejects all files that don’t start with optionally preceded.
After that, I tried a dirty patch to fix the issue in FFmpeg, which can also fix another related issue https://github.com/mpv-player/mpv/issues/2846
From 4e88f228ec52d84fa84cfb892da51911b100150e Mon Sep 17 00:00:00 2001
From: dyphire <qimoge@gmail.com>
Date: Wed, 27 Sep 2023 22:16:48 +0800
Subject: [PATCH] avformat/assdec: more lenient probing
---
libavformat/assdec.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/libavformat/assdec.c b/libavformat/assdec.c
index 6b37af15e8..e2ad3f9a34 100644
--- a/libavformat/assdec.c
+++ b/libavformat/assdec.c
@@ -34,7 +34,7 @@ typedef struct ASSContext {
static int ass_probe(const AVProbeData *p)
{
- char buf[13];
+ char buf[1000];
FFTextReader tr;
ff_text_init_buf(&tr, p->buf, p->buf_size);
@@ -43,7 +43,7 @@ static int ass_probe(const AVProbeData *p)
ff_text_read(&tr, buf, sizeof(buf));
- if (!memcmp(buf, "[Script Info]", 13))
+ if (strstr(buf, "[Script Info]") || strstr(buf, "[V4 Styles]") || strstr(buf, "[V4+ Styles]"))
return AVPROBE_SCORE_MAX;
return 0;
--
test build for windows: https://github.com/dyphire/mpv-winbuild/actions/runs/4493477864
I'm not satisfied with the way this patch works, but it's surprising that this issue has been around for over 7 years and still hasn't been fixed in FFmpeg.
I reported this to trac https://trac.ffmpeg.org/ticket/10583
While the above diff should work, ffmpeg already has code for stripping bom marks so that patch is unlikely to be accepted upstream since it doesn't address the root cause of the issue.
There are no bugs, this file just has a double BOM. I've verified that the first 4 bytes are fffe fffe
, removing the second BOM allows the file to be loaded as utf16le correctly.
There are no bugs, however it might be possible to hack lavf to load these types of files, but I'd prefer not to submit such a patch. Are there a ton of files like this?
The UTF-8 encoding with a BOM will also fail.
This is not true, I tested utf-8, utf16be and utf16le BOMs they all work correctly.
You can also work around it like so
mpv [video] --sub-file=lavf://subfile,,start,2,end,0,,:[video]
There are no bugs, however it might be possible to hack lavf to load these types of files, but I'd prefer not to submit such a patch. Are there a ton of files like this?
In fact, there are many subtitles files with this issue, and I have also seen feedback elsewhere. For this reason, I have written a python script to handle subtitle encoding conversion. see https://github.com/dyphire/subtitle-convert
However, what I am more concerned about is the issue https://github.com/mpv-player/mpv/issues/2846
There are much more subtitle files missing this header of [Script Info]
, and the users who are troubled by this are more widespread.
As mentioned there https://github.com/libass/libass/issues/325#issuecomment-452033155, it is a more reasonable behavior to additionally detect the information of [V4+ Styles]
for judgment, and no functioning ASS subtitles will be missing it.
There are much more subtitle files missing this header of [Script Info],
Even if you make ffmpeg probe such broken files correctly, it's pointless because libass doesn't know how to parse PlayResX/PlayResY without the section header. So your subtitles will appear broken anyway.
I could make such a change to lavf if libass became dumber in parsing like VSFilter, but currently there's not much point.
In fact, there are many subtitles files with this issue, and I have also seen feedback elsewhere.
Hopefully it gets merged lol https://ffmpeg.org/pipermail/ffmpeg-devel/2023-September/314767.html
Also if you have a sample of a broken UTF-8 BOM file it'd be helpful
/cc @TheOneric
Also if you have a sample of a broken UTF-8 BOM file it'd be helpful
The broken UTF-8 BOM file was only encoded from the example file, so it should be the same bug.
Even if you make ffmpeg probe such broken files correctly, it's pointless because libass doesn't know how to parse PlayResX/PlayResY without the section header. So your subtitles will appear broken anyway.
I could make such a change to lavf if libass became dumber in parsing like VSFilter, but currently there's not much point.
As mentioned in the comments above https://github.com/libass/libass/issues/325#issuecomment-452033155, libass’ own whole-file parser handles files without [Script Info]
just fine (In the past, mpv was work with use it), as vsfilter did (mpc-hc/be).
This is an ffmpeg quirk rather than a libass issue. There are thousands or more of these subtitle files on the internet (I suspect they were generated by similar scripts), and it seems foolish to not attempt to fix this compatibility issue.
libass’ own whole-file parser handles files without [Script Info] just fine, as vsfilter did (mpc-hc/be).
It can detect that it's an ass file, and playback dialogue correctly. However it won't read options without a header, unlike VSFilter.
You just get this message after fixing the ffmpeg bug, making the whole thing kind of pointless
[sub/ass] Neither PlayResX nor PlayResY defined. Assuming 384x288
libass’ own whole-file parser handles files without [Script Info] just fine, as vsfilter did (mpc-hc/be).
It can detect that it's an ass file, and playback dialogue correctly. However it won't read options without a header, unlike VSFilter.
You just get this message after fixing the ffmpeg bug, making the whole thing kind of pointless
[sub/ass] Neither PlayResX nor PlayResY defined. Assuming 384x288
I don't think this is completely meaningless. If the ass tag is not used in these subtitles (It is likely that the srt subtitles were converted using the wrong script), then it has almost no impact. Of course, perhaps missing parsing should be fixed in libass.
I have ASS subtitles downloaded from various internet sites that do not have [Script Info] section completely (the first line is [V4+ Styles]).
In addition, there are extreme examples here https://github.com/libass/libass/issues/325#issue-396578735, these ass files completely lack PlayResX and PlayResY. libass and vsfilter will do the same thing. Anyway, it's better than failing to parse without rendering.
I'll look into this in a few days, but it'd be better if libass attempted to find "PlayResX/Y:" even if it couldn't find the "[Script Info]" section header.
Of course, if there's a valid reason to not do this, then that's fair as well.
I'll look into this in a few days, but it'd be better if libass attempted to find "PlayResX/Y:" even if it couldn't find the "[Script Info]" section header.
Of course, if there's a valid reason to not do this, then that's fair as well.
I noticed that only a simple libass patch is needed to parse missing "PlayResX/Y:" when the "[Script Info]" header does not exist.
From 4cfbbe2f70c02e156785de9bc12d124b6914bd29 Mon Sep 17 00:00:00 2001
From: dyphire <qimoge@gmail.com>
Date: Wed, 27 Sep 2023 22:48:04 +0800
Subject: [PATCH] Try to parse script properties even if missing "[Script Info]"
---
libass/ass.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/libass/ass.c b/libass/ass.c
index 0679483..c0734d2 100644
--- a/libass/ass.c
+++ b/libass/ass.c
@@ -1143,7 +1143,7 @@ mem_fail:
static int process_line(ASS_Track *track, char *str)
{
skip_spaces(&str);
- if (!ass_strncasecmp(str, "[Script Info]", 13)) {
+ if (!ass_strncasecmp(str, "[Script Info]", 13) || !ass_strncasecmp(str, "ScriptType:", 11)) {
track->parser_priv->state = PST_INFO;
} else if (!ass_strncasecmp(str, "[V4 Styles]", 11)) {
track->parser_priv->state = PST_STYLES;
--
The following is an example file:
[WLGO&PPX][Itazura_na_Kiss][DVDRip][05][x264_AAC][880x480][FB7AD9EF].gb.zip
After saving as a file using e.g. UTF-8 encoding, the problem disappears, but it'll be more convenient if UTF-16LE with a BOM is supported.