Use Whisper to clean without subtitles

mmguero / cleanvid

cleanvid is a little script to mute profanity in video files

BSD 3-Clause "New" or "Revised" License

57 stars 6 forks source link

Use Whisper to clean without subtitles #25

Closed PetrusVermaak closed 10 months ago

PetrusVermaak commented 11 months ago

Any idea if Whisper can be used to clean up bad words automatically, without subtitle file?

https://github.com/Purfview/whisper-standalone-win/releases/tag/faster-whisper

We could create a predetermined list of words, run whisper, and then it cleans the audio in one go?

PetrusVermaak commented 11 months ago

Whisper Timestamped could help: https://www.youtube.com/watch?v=YZSgD0RAkrg&ab_channel=HughBone https://github.com/linto-ai/whisper-timestamped

Would be the most accurate swear filter in the entire world!

PetrusVermaak commented 11 months ago

maybe this could also help: https://github.com/gulshanrana10/Audio-Profanity-Filter

mmguero commented 10 months ago

So I have another project called monkeyplug which uses speech recognition (with VOSK) to do just what you're talking about. While the description of the project there says its for audio files, it will actually extract the audio from video files, clean it, and then repackage it back into the mkv or mp4 container using ffmpeg.

I don't think I'm going to add recognition capabilities to cleanvid to reduce the dependencies on this project, but if monkeyplug doesn't work for this use case I'd be interested in fixing whatever those issues are.

mmguero commented 10 months ago

For example, I just tested it out:

$monkeyplug -i Oppenheimer.2023.mkv -o Oppenheimer.2023.new --output-json Oppenheimer.2023.monkeyplug.json

results in a file called Oppenheimer.2023.new.mkv with the profanity removed, and (since I specified --output-json a file called Oppenheimer.2023.monkeyplug.json with the words detected, their timestamps, and whether or not they were scrubbed:

...
{
    "conf": 0.580837,
    "end": 4117.287842,
    "start": 4117.24,
    "word": "what",
    "scrub": false
  },
  {
    "conf": 1,
    "end": 4117.9,
    "start": 4117.287842,
    "word": "the",
    "scrub": false
  },
  {
    "conf": 1,
    "end": 4118.11,
    "start": 4117.9,
    "word": "hell",
    "scrub": true
  },
  {
    "conf": 0.951171,
    "end": 4118.23,
    "start": 4118.11,
    "word": "are",
    "scrub": false
  },
  {
    "conf": 1,
    "end": 4118.32,
    "start": 4118.23,
    "word": "you",
    "scrub": false
  },
  {
    "conf": 1,
    "end": 4118.65,
    "start": 4118.35,
    "word": "doing",
    "scrub": false
  },
...

mmguero commented 10 months ago

Note though with movies you have more stuff like music, background noise, etc. so it's not as accurate as say, with a podcast.

PetrusVermaak commented 10 months ago

Unfortunately, monkeyplug and VOSK cannot produce the same results.

That's why I recommend Whisper Faster, which is AI-based. It's insanely accurate with everything: music, songs, movies, you name it. It can also create subtitles for almost every language. It truly is incredible. Just try this and see how amazing it is. The vad_filter is what makes it rock for accuracy and languages.

To get a subtitle file with one word per line:

Get a copy of whisper-faster for Windows: https://github.com/Purfview/whisper-standalone-win/releases/tag/faster-whisper
The software will download files the first time you run it. It only does this once, so the first run could take a while. whisper-faster-r167.2 "C:\video.mp4" --one_word 1 --vad_filter True --language=English --model=large-v2 (The cmd above is the optimal best practice settings, confirmed by the creator)

mmguero commented 10 months ago

Thanks for the suggestion. Perhaps I'll take a look at whisper for monkeyplug.

mmguero commented 9 months ago

FWIW, @PetrusVermaak , monkeyplug now supports an OpenAI-Whisper mode. I may look at faster-whisper in a future iteration.