m-bain / whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
BSD 2-Clause "Simplified" License
12.61k stars 1.33k forks source link

Dockerfile for transcription and Speaker Diarization #909

Open kowshik24 opened 3 weeks ago

kowshik24 commented 3 weeks ago

I faced many issues while building the dockerfile for transcription and Speaker Diarization. Is there any git-repo available for that? Or are you planning to create a docker file specifically for runpod serverless.

randyburden commented 3 weeks ago

I use the pre-built Docker images from this repo: https://github.com/jim60105/docker-whisperX

kowshik24 commented 3 weeks ago

@randyburden thanks for sharing I got the same repo. Do you have the docker hub repo for that?

randyburden commented 3 weeks ago

@kowshik24, no, I don't use Docker Hub. I use the code below to pull in the Docker WhisperX image from the GitHub Container Registry, then create a new customized Docker image that preloads and caches the Pyannote models for offline use, and then upload that Docker image to Azure Container Services.

# Define optional arguments that indicate the OpenAI Whisper model size and language to use
ARG WHISPER_MODEL=medium
ARG LANG=en

# Get the base WhisperX Docker image (https://github.com/jim60105/docker-whisperX)
FROM ghcr.io/jim60105/whisperx:${WHISPER_MODEL}-${LANG}

# Define the required argument for the huggingface.co token used by Pyannote (diarization/speaker-recognition library)
ARG HUGGING_FACE_TOKEN

# Output argument value for debugging/inspecting
RUN echo "Huggingface.co token: ${HUGGING_FACE_TOKEN}"

# Ensure the required argument was supplied
# (test -n "") Returns false if the string is zero length
RUN test -n "$HUGGING_FACE_TOKEN" || (echo "HUGGING_FACE_TOKEN argument is required" && false)

# Preload and cache the Pyannote models so that the image can run offline
RUN python3 -c 'from pyannote.audio import Pipeline; pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization-3.1", use_auth_token="'${HUGGING_FACE_TOKEN}'")'