m-bain / whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
BSD 2-Clause "Simplified" License
11.34k stars 1.19k forks source link

Diarization Pipeline config on diarize.py #773

Open alejandrogranizo opened 5 months ago

alejandrogranizo commented 5 months ago

Hi, im opening this issue since we are working from a place with connection restrictions. HuggingFace downloads falls into these kinds of restrictions, so the configuration of the DiarizationPipeline class is becoming a problem when trying to use the diarization feature of the library.

We are trying to run the following code in our project:

self.diarize_model = whisperx.DiarizationPipeline(model_name='pyannote/speaker-diarization-3.1', use_auth_token='OUR_VALID_TOKEN', device='cuda')

This is the recommended way to create the diarization pipeline to later start the diarization feature. Issue comes because of the restrictions of the network. As the use of the Pipeline in the DiarizationPipeline class is using pyannote.audio Pipeline.from_pretrained method with arguments (model_name, auth_token) instead of giving a way of checking local resources first (such as providing the local path for the config.yml or config.yaml file as arg), it always uses the instantiation of pyannote.audio Pipeline in a way that gets to the line hf_hub_download, as the model name is never detected as a yml/yaml file because of the treatment made on DiarizationPipeline class. As the request does not go trough because of the network restrictions that are present in many territories, it delays the execution until multiple timeouts for the request occur, just to finally go to the option of searching for the local model that exists in the filesystem since the beggining of the execution.

Please is any solution available, when using it in a bigger project it is so annoying having to wait for the multiple timeouts to occur just to test and debug the project.

Thanks

Hyprnx commented 5 months ago

hi, having the same problem with network restriction. if there are any solution, i would be interested to know. Thank you in advanced

GroovyDan commented 5 months ago

It is possible to download all the required models and reference them from a local file system. This article from AWS describes downloading all of the models to a local file system, which is similar to the approach I took. I was able to build a docker image that loads all the models from AWS S3 into the docker container during build and then reference all the models via their local path when running whipserx. Specifically for Diarization, where config.yaml is updated to reference local paths to the necessary models downloaded from HuggingFace:

print(">> Loading Diarization Pipeline") diarize_model = whisperx.DiarizationPipeline( model_name=os.path.join(MODEL_DIR, DIARIZATION_FOLDER, "config.yaml"), device=DEVICE )

Hyprnx commented 5 months ago
pipeline:
  name: pyannote.audio.pipelines.SpeakerDiarization
  params:
    clustering: AgglomerativeClustering
    embedding: pytorch_model_embedding.bin
    embedding_batch_size: 32
    embedding_exclude_overlap: true
    segmentation: pytorch_model_segmentation.bin
    segmentation_batch_size: 32

params:
  clustering:
    method: centroid
    min_cluster_size: 15
    threshold: 0.7153814381597874
  segmentation:
    min_duration_off: 0.5817029604921046
    threshold: 0.4442333667381752

device: cuda

@alejandrogranizo the above is my config.yaml. You can go to huggingface, sign the agreement with pyannote and download the respective model.bin file, and change the path to the model.bin file. Implemented and worked on my machine, both on cloud and on prem. DM me if you have any probs

Hyprnx commented 5 months ago
diarization_pipeline = whisperx.DiarizationPipeline(<config.yaml>)

initialize and

result = diarization_pipeline(your_audiofile_path)

this should work

Dmitriuso commented 4 months ago

@Hyprnx which version of pyannote.audio are you using? I got an error with pyannote.audio=3.1.1 : "threshold parameter doesn't exist".

Hyprnx commented 4 months ago

@Dmitriuso the pyannote audio i use comes with WhisperX when i install it. i didnt install it separately.

pip install git+https://github.com/m-bain/whisperX.git@78dcfaab51005aa703ee21375f81ed31bc248560

this should work

nkilm commented 3 months ago

If you are looking for a way to run whisperx completely offline, I have a script for that,

Repo - https://github.com/nkilm/offline-whisperx

You have to manually download the models and then specify the paths in the script. The script works 100% locally without internet.

jim60105 commented 3 months ago

Try my containers that have those models built-in. https://github.com/jim60105/docker-whisperX

docker image save it to a tar file, and docker image load to your another machine. The project is open source, you can see how I implemented it and integrate it into your own project.😉