Closed PetrusVermaak closed 10 months ago
Whisper Timestamped could help: https://www.youtube.com/watch?v=YZSgD0RAkrg&ab_channel=HughBone https://github.com/linto-ai/whisper-timestamped
Would be the most accurate swear filter in the entire world!
maybe this could also help: https://github.com/gulshanrana10/Audio-Profanity-Filter
So I have another project called monkeyplug which uses speech recognition (with VOSK) to do just what you're talking about. While the description of the project there says its for audio files, it will actually extract the audio from video files, clean it, and then repackage it back into the mkv or mp4 container using ffmpeg.
I don't think I'm going to add recognition capabilities to cleanvid to reduce the dependencies on this project, but if monkeyplug doesn't work for this use case I'd be interested in fixing whatever those issues are.
For example, I just tested it out:
$monkeyplug -i Oppenheimer.2023.mkv -o Oppenheimer.2023.new --output-json Oppenheimer.2023.monkeyplug.json
results in a file called Oppenheimer.2023.new.mkv
with the profanity removed, and (since I specified --output-json
a file called Oppenheimer.2023.monkeyplug.json
with the words detected, their timestamps, and whether or not they were scrubbed:
...
{
"conf": 0.580837,
"end": 4117.287842,
"start": 4117.24,
"word": "what",
"scrub": false
},
{
"conf": 1,
"end": 4117.9,
"start": 4117.287842,
"word": "the",
"scrub": false
},
{
"conf": 1,
"end": 4118.11,
"start": 4117.9,
"word": "hell",
"scrub": true
},
{
"conf": 0.951171,
"end": 4118.23,
"start": 4118.11,
"word": "are",
"scrub": false
},
{
"conf": 1,
"end": 4118.32,
"start": 4118.23,
"word": "you",
"scrub": false
},
{
"conf": 1,
"end": 4118.65,
"start": 4118.35,
"word": "doing",
"scrub": false
},
...
Note though with movies you have more stuff like music, background noise, etc. so it's not as accurate as say, with a podcast.
Unfortunately, monkeyplug and VOSK cannot produce the same results.
That's why I recommend Whisper Faster, which is AI-based. It's insanely accurate with everything: music, songs, movies, you name it. It can also create subtitles for almost every language. It truly is incredible. Just try this and see how amazing it is. The vad_filter is what makes it rock for accuracy and languages.
To get a subtitle file with one word per line:
Get a copy of whisper-faster for Windows: https://github.com/Purfview/whisper-standalone-win/releases/tag/faster-whisper
The software will download files the first time you run it. It only does this once, so the first run could take a while. whisper-faster-r167.2 "C:\video.mp4" --one_word 1 --vad_filter True --language=English --model=large-v2 (The cmd above is the optimal best practice settings, confirmed by the creator)
Thanks for the suggestion. Perhaps I'll take a look at whisper for monkeyplug.
FWIW, @PetrusVermaak , monkeyplug now supports an OpenAI-Whisper mode. I may look at faster-whisper in a future iteration.
Any idea if Whisper can be used to clean up bad words automatically, without subtitle file?
https://github.com/Purfview/whisper-standalone-win/releases/tag/faster-whisper
We could create a predetermined list of words, run whisper, and then it cleans the audio in one go?