Multilingual Filtering - Githubissues

mmguero / cleanvid

cleanvid is a little script to mute profanity in video files

BSD 3-Clause "New" or "Revised" License

56 stars 6 forks source link

Multilingual Filtering #12

Closed hendursaga closed 1 year ago

hendursaga commented 1 year ago

The swears.txt file is English-only, but I like watching foreign-language films, so a multilingual dataset of bad words would be great. LDNOOBW by Shutterstock is probably the best dataset I've come across that could do the job.

Seeing as we already have a --lang flag, perhaps we could extend it to select which language(s) to search for bad words? The one problem would be when some video has more than one language in it - thoughts?

mmguero commented 1 year ago

Thanks for the suggestion! I'll check it out.

mmguero commented 1 year ago

Apologies, I don't see myself spending time on this. A well-considered PR would be accepted.