mpv-player / mpv

🎥 Command line video player
https://mpv.io
Other
27.94k stars 2.88k forks source link

Question about sub-filter-regex #8895

Closed Kenifornication closed 3 years ago

Kenifornication commented 3 years ago

Hello,

I've been trying to get the --sub-filter-regex options to work but to no avail. I don't know if I'm not using it correctly or something else.

I've added the example line from the docs (sub-filter-regex-append=opensubtitles\.org) to my mpv.conf and created a subtitle file with only one line that contains opensubtitles.org. The line appears normally, it is not filtered as I would expect.

So either I'm misunderstanding the option's use case or something is not working as it should. Please advise.

(Unrelated, but kinda related: in the docs, the anchor of List Options that is mentioned repeatedly in the docs does not lead anywhere, so I still don't know what a List Options is exactly)

Log file

output.txt

snylonue commented 3 years ago

Matching is case-insensitive, but how this is done depends on the libc, and most likely works in ASCII only. It does not work on bitmap/image subtitles. Unavailable on inferior OSes (requires POSIX regex support)

Probably because of this.

Kenifornication commented 3 years ago

Matching is case-insensitive, but how this is done depends on the libc, and most likely works in ASCII only. It does not work on bitmap/image subtitles. Unavailable on inferior OSes (requires POSIX regex support)

Probably because of this.

Which part of it, exactly? I'm on Windows, does it not support POSIX regex?

snylonue commented 3 years ago

Matching is case-insensitive, but how this is done depends on the libc, and most likely works in ASCII only. It does not work on bitmap/image subtitles. Unavailable on inferior OSes (requires POSIX regex support)

Probably because of this.

Which part of it, exactly? I'm on Windows, does it not support POSIX regex?

https://github.com/mpv-player/mpv/blob/0c1544e66cb980bd8a1dedb13ac07904380da608/sub/sd_ass.c#L65-L72

HAVE_POSIX doesn't seem to exist on windows

Dudemanguy commented 3 years ago

Yup, this won't work on windows as it is currently implemented. Feel free to open a feature request I suppose. No clue how feasible it is or isn't.

avih commented 3 years ago

Well, while there are various regex libs around which work on windows, some of them even quite small (not necessarily with POSIX API though), an interesting approach could be to use lua or js - both of which support pattern matching, and mpv is typically built with at least lua support. It shouldn't be too hard to write a similar --sub-filter-lua which takes lua patterns instead of posix patterns and does the same thing by running the text through an instance of a lua script.

Or, much simpler and possibly useful, if HAVE_POSIX is not defined, run the same filter as case-insensitive string matcher. This doesn't even need any lib, nor lua/js.

avih commented 3 years ago

run the same filter as case-insensitive string matcher

@Kenifornication would that be useful to you? (I've never used this filter, so no idea how common are proper regex expression vs plain string comparison).

Kenifornication commented 3 years ago

run the same filter as case-insensitive string matcher

@Kenifornication would that be useful to you? (I've never used this filter, so no idea how common are proper regex expression vs plain string comparison).

For sure, it's better than nothing. I can't speak for other use cases but as someone who uses subtitles for everything I would say that the text you'd most likely want to hide are ads (like in the example) or notices like "this program contains..." and stuff like that, so case-sensitivity is not required, as long as the option allows for multiple terms to filter.

avih commented 3 years ago

For sure, it's better than nothing

Well, only if it's useful enough in practice to actually get used.

So, may I ask, if the option was working for you as originally intended (regex), which values would you use with it?

Kenifornication commented 3 years ago

So, may I ask, if the option was working for you as originally intended (regex), which values would you use with it?

Here's how I have it in my config file right now:

sub-filter-regex-append=opensubtitles|addic7ed|hiventy|red bee media|explosiveskull|^...$

The last one is for French teletext-style subtitles which have three dot lines as "filler". Not sure if that's of any help to you.

avih commented 3 years ago

sub-filter-regex-append=opensubtitles|addic7ed|hiventy|red bee media|explosiveskull|^...$

The last one is for French teletext-style subtitles which have three dot lines as "filler".

That doesn't match only "three dot lines" - assuming dot is literal dot. It matches three any-char lines. Three literal dots would be ^\.\.\.$ (or maybe ^\\.\\.\\.$ to account for C/JSON escapes interpreted by the options parser).

So anyway, except the last one (^...$) which wants full-string, all the others could use case-insensitive-plain-substring match - with multiple values, like so (if we had such option):

sub-filter-str=opensubtitles,addic7ed,hiventy,red bee media,explosiveskull

Not sure if that's of any help to you.

I'm just trying to assess if there's anything useful we could do without too much effort where regex is not available. And I think it does help, because at least for your use case, it seems that a simple string matcher would cover almost all of your use case, and possibly even all of it, assuming "three dot" means three literal dots (if we choose to interpret ^ and $ too, or allow some other way to specify full-string).

So overall, good info IMHO. Thanks.

If anyone else uses --sub-filter-regex and wants to share their values to help us collect some "telemetry", that could help too.

avih commented 3 years ago

For instance, I thought this syntax could be both trivial to implement and also reasonably useful (and would cover all OP's use cases, assuming "three dot" is literal), for a potential new string-list option --sub-filter-str:

VALUE := [/[ibe]/]STR

Where the initial /.../ part is optional, but if it exists then inside it i means STR match would be case-insensitive, b means it matches STR at the begining of the line, and e for matching STR at the end of the line, where any combination of these letters is alowed (including none), and everything which follows the second / is taken literally (STR).

Without /.../ at the begining (or with empty //) it would be case-sensitive substring match.

If STR begins with / then a /.../ section is mandatory before STR, possibly empty.

Examples:

Kenifornication commented 3 years ago

That doesn't match only "three dot lines"

LOL I feel so dumb now. (I'd have caught that one out if the option was functional for me 😅)

Anyway, I can say as a subtitle-dependent viewer who's used subtitles in multiple languages, there aren't really a lot of situations where the average user might need to employ complicated Regular Expressions. If the the inclusion of Lua pattern option isn't a big burden, I think that would be satisfactory for most outlier use-cases where RegEx isn't available.

I think in the end it's up to you to see if it's easier to add a Lua filter option or coming up with the syntax and all that for string matching. If it was up to me I'd have simple string matching as fallback for non-POSIX builds and a Lua option, but I don't really know enough about the project to have an opinion.

avih commented 3 years ago

mpv now supports --sub-filter-jsre which is the same as --sub-filter-regex but using JS regular expressions, and requires only JS support in mpv (which 3rd party Windows mpv builds usually have). Shares all the control options of the regex filter (like --sub-filter-regex-warn).

Feel free to report how it works for you.

Kenifornication commented 3 years ago

Feel free to report how it works for you.

Sorry for the late feedback, was waiting for Shinchiro to update his build. Anyway, I tested it today and it worked exactly the way I expected it to work. I've tested it thoroughly, including formatting and other probably scenarios. Works like a charm.

Thank you very much @avih for taking the time to do it, I wasn't expecting anything out of this ticket so I appreciate the effort you put into it.

avih commented 3 years ago

Glad to hear. Thanks for testing.