Closed Kenifornication closed 3 years ago
Matching is case-insensitive, but how this is done depends on the libc, and most likely works in ASCII only. It does not work on bitmap/image subtitles. Unavailable on inferior OSes (requires POSIX regex support)
Probably because of this.
Matching is case-insensitive, but how this is done depends on the libc, and most likely works in ASCII only. It does not work on bitmap/image subtitles. Unavailable on inferior OSes (requires POSIX regex support)
Probably because of this.
Which part of it, exactly? I'm on Windows, does it not support POSIX regex?
Matching is case-insensitive, but how this is done depends on the libc, and most likely works in ASCII only. It does not work on bitmap/image subtitles. Unavailable on inferior OSes (requires POSIX regex support)
Probably because of this.
Which part of it, exactly? I'm on Windows, does it not support POSIX regex?
https://github.com/mpv-player/mpv/blob/0c1544e66cb980bd8a1dedb13ac07904380da608/sub/sd_ass.c#L65-L72
HAVE_POSIX
doesn't seem to exist on windows
Yup, this won't work on windows as it is currently implemented. Feel free to open a feature request I suppose. No clue how feasible it is or isn't.
Well, while there are various regex libs around which work on windows, some of them even quite small (not necessarily with POSIX API though), an interesting approach could be to use lua or js - both of which support pattern matching, and mpv is typically built with at least lua support. It shouldn't be too hard to write a similar --sub-filter-lua
which takes lua patterns instead of posix patterns and does the same thing by running the text through an instance of a lua script.
Or, much simpler and possibly useful, if HAVE_POSIX is not defined, run the same filter as case-insensitive string matcher. This doesn't even need any lib, nor lua/js.
run the same filter as case-insensitive string matcher
@Kenifornication would that be useful to you? (I've never used this filter, so no idea how common are proper regex expression vs plain string comparison).
run the same filter as case-insensitive string matcher
@Kenifornication would that be useful to you? (I've never used this filter, so no idea how common are proper regex expression vs plain string comparison).
For sure, it's better than nothing. I can't speak for other use cases but as someone who uses subtitles for everything I would say that the text you'd most likely want to hide are ads (like in the example) or notices like "this program contains..." and stuff like that, so case-sensitivity is not required, as long as the option allows for multiple terms to filter.
For sure, it's better than nothing
Well, only if it's useful enough in practice to actually get used.
So, may I ask, if the option was working for you as originally intended (regex), which values would you use with it?
So, may I ask, if the option was working for you as originally intended (regex), which values would you use with it?
Here's how I have it in my config file right now:
sub-filter-regex-append=opensubtitles|addic7ed|hiventy|red bee media|explosiveskull|^...$
The last one is for French teletext-style subtitles which have three dot lines as "filler". Not sure if that's of any help to you.
sub-filter-regex-append=opensubtitles|addic7ed|hiventy|red bee media|explosiveskull|^...$
The last one is for French teletext-style subtitles which have three dot lines as "filler".
That doesn't match only "three dot lines" - assuming dot is literal dot. It matches three any-char lines. Three literal dots would be ^\.\.\.$
(or maybe ^\\.\\.\\.$
to account for C/JSON escapes interpreted by the options parser).
So anyway, except the last one (^...$
) which wants full-string, all the others could use case-insensitive-plain-substring match - with multiple values, like so (if we had such option):
sub-filter-str=opensubtitles,addic7ed,hiventy,red bee media,explosiveskull
Not sure if that's of any help to you.
I'm just trying to assess if there's anything useful we could do without too much effort where regex is not available. And I think it does help, because at least for your use case, it seems that a simple string matcher would cover almost all of your use case, and possibly even all of it, assuming "three dot" means three literal dots (if we choose to interpret ^
and $
too, or allow some other way to specify full-string).
So overall, good info IMHO. Thanks.
If anyone else uses --sub-filter-regex
and wants to share their values to help us collect some "telemetry", that could help too.
For instance, I thought this syntax could be both trivial to implement and also reasonably useful (and would cover all OP's use cases, assuming "three dot" is literal), for a potential new string-list option --sub-filter-str
:
VALUE := [/[ibe]/]STR
Where the initial /.../
part is optional, but if it exists then inside it i
means STR match would be case-insensitive, b
means it matches STR at the begining of the line, and e
for matching STR at the end of the line, where any combination of these letters is alowed (including none), and everything which follows the second /
is taken literally (STR).
Without /.../
at the begining (or with empty //
) it would be case-sensitive substring match.
If STR begins with /
then a /.../
section is mandatory before STR, possibly empty.
Examples:
foo
or //foo
: case-sensitive match for substring foo
./i/foo
: case-INsensitive match for substring foo
./bi/foo
: case-INsensitive match for substring foo
at the begining of the line./e/foo
: the end of the line is exactly foo
./be/foo
: the line is exactly foo
.///bar
: case-sensitive match for the substring /bar
.That doesn't match only "three dot lines"
LOL I feel so dumb now. (I'd have caught that one out if the option was functional for me 😅)
Anyway, I can say as a subtitle-dependent viewer who's used subtitles in multiple languages, there aren't really a lot of situations where the average user might need to employ complicated Regular Expressions. If the the inclusion of Lua pattern option isn't a big burden, I think that would be satisfactory for most outlier use-cases where RegEx isn't available.
I think in the end it's up to you to see if it's easier to add a Lua filter option or coming up with the syntax and all that for string matching. If it was up to me I'd have simple string matching as fallback for non-POSIX builds and a Lua option, but I don't really know enough about the project to have an opinion.
mpv now supports --sub-filter-jsre
which is the same as --sub-filter-regex
but using JS regular expressions, and requires only JS support in mpv (which 3rd party Windows mpv builds usually have). Shares all the control options of the regex filter (like --sub-filter-regex-warn
).
Feel free to report how it works for you.
Feel free to report how it works for you.
Sorry for the late feedback, was waiting for Shinchiro to update his build. Anyway, I tested it today and it worked exactly the way I expected it to work. I've tested it thoroughly, including formatting and other probably scenarios. Works like a charm.
Thank you very much @avih for taking the time to do it, I wasn't expecting anything out of this ticket so I appreciate the effort you put into it.
Glad to hear. Thanks for testing.
Hello,
I've been trying to get the
--sub-filter-regex
options to work but to no avail. I don't know if I'm not using it correctly or something else.I've added the example line from the docs (
sub-filter-regex-append=opensubtitles\.org
) to mympv.conf
and created a subtitle file with only one line that containsopensubtitles.org
. The line appears normally, it is not filtered as I would expect.So either I'm misunderstanding the option's use case or something is not working as it should. Please advise.
(Unrelated, but kinda related: in the docs, the anchor of
List Options
that is mentioned repeatedly in the docs does not lead anywhere, so I still don't know what a List Options is exactly)Log file
output.txt