How to transcribe podcast audio (WhisperX with speaker diarization)

swyxio commented 1 year ago

category: tutorial slug: transcribe-podcasts-with-whisper tag: podcasts, ai, whisper cover_image: https://user-images.githubusercontent.com/6764957/221219413-e83cec72-3164-40ad-bd48-0ca41616f224.png

Note: sometimes WhisperX is WAAYYYY too slow so I often end up using https://github.com/ggerganov/whisper.cpp which somehow runs much faster.

I do a lot of podcast transcription work and had need for it again today. The HuggingFace spaces (like this one https://huggingface.co/spaces/vumichien/whisper-speaker-diarization) always error out so aren't very useful.

This is the one that worked for me.

Note: if you run into a New error: 'soundfile' backend is not available error, conda install -c conda-forge libsndfile to fix.

make sure you have the .wav for your podcast audio. you can use quicktime or audacity to convert it. this process doesnt work for mp3
pip3 install git+https://github.com/m-bain/whisperx.git this will take a couple minutes. meanwhile...
Read https://github.com/m-bain/whisperX#voice-activity-detection-filtering--diarization. To enable VAD filtering and Diarization, include your Hugging Face access token that you can generate from Here after the --hf_token argument and accept the user agreement for the following models: Segmentation , Voice Activity Detection (VAD) , and Speaker Diarization. make sure to accept them all in your huggingface account.
whisperx YOUR_AUDIO_FILE.wav --hf_token YOUR_HF_TOKEN_HERE --vad_filter --diarize --min_speakers 3 --max_speakers 3 --language en for 3 speakers in English. remember it must be a .wav file.

It takes about 30 seconds to transcribe 30 seconds so be prepared for it to take the time of your audio podcast to transcribe.

swyxio commented 1 year ago

Related reads:

cmuhire commented 1 year ago

@sw-yx after reading the above I think you'll appreciate this (think DHH's famous "blog in 15 minutes with Rails" but for embedding real-time Whisper audio transcription in a Phoenix app) 🙂 https://www.youtube.com/watch?v=Yd220Te8cHc

Most of the ML community has not yet caught up with the incredible tooling development that's been quietly brewing in the Elixir community through projects like the Nx ecosystem and Livebook in just the last 2 (!) years but I expect this to change soon as things start to come together into a coherent story that showcases the unique value prop of the platform (in no small part thanks to the unique guarantees of the BEAM) compared to the incumbent stacks like Python/R

swyxio commented 1 year ago

v cool. will keep a look out!

swyxio / swyxdotio

How to transcribe podcast audio (WhisperX with speaker diarization) #470

category: tutorial slug: transcribe-podcasts-with-whisper tag: podcasts, ai, whisper cover_image: https://user-images.githubusercontent.com/6764957/221219413-e83cec72-3164-40ad-bd48-0ca41616f224.png