m-bain / whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
BSD 2-Clause "Simplified" License
11.76k stars 1.24k forks source link

How to load a local model by default instead of going to huggingface to find a model first? #488

Open chaoqingshuai opened 1 year ago

chaoqingshuai commented 1 year ago

How to load a local model by default instead of going to huggingface to find a model first?

Download the model to/data/local_models/models--guillaumekln--faster-whisper-medium using the download_root parameter

Reload model code:

whisperx.load_model("medium", device, compute_type=compute_type, download_root="/data/local_models")

The terminal displays:

An error occured while synchronizing the model guillaumekln/faster-whisper-medium.en from the Hugging Face Hub:

(MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url:  /api/models/guillaumekln/faster-whisper-medium.en/revision/main (Caused by NewConnectionError('< urllib3.connection.HTTPSConnection object at 0x7ff7b6c7ad70> : Failed to establish a new connection: [Errno 101] Network is unreachable'))"), '(Request ID:  88c5f925-0b45-4ba2-9b9b-a9c641f7cea6)')

Trying to load the model directly from the local cache, if it exists.
ElmyMaty commented 10 months ago

Would be interested as well :) Did you get any answer for this?

KossaiSbai commented 9 months ago

I am happy to give this a go, will let you guys know how I get on :)

KossaiSbai commented 9 months ago

Hey @chaoqingshuai @ElmyMaty , I just tried to replicate the issue.

I ran the following code twice:

if __name__ == '__main__':
    device = "cpu"
    audio_file = "audio.mp3"
    batch_size = 16  # reduce if low on GPU mem
    compute_type = "int8"  # change to "int8" if low on GPU mem (may reduce accuracy)

    # 1. Transcribe with original whisper (batched)
    model = whisperx.load_model("medium", device, compute_type=compute_type, download_root="/Users/kossaisbai/data/local_models")

the first time as expected, it downloaded the model from hugging face and stored it in /Users/kossaisbai/data/local_models folder. The second time it directly loaded the model from that folder rather than from hugging face see the output below:

No language specified, language will be first be detected for each audio file (increases inference time). Lightning automatically upgraded your loaded checkpoint from v1.5.4 to v2.1.3. To apply the upgrade to your files permanently, run python -m pytorch_lightning.utilities.upgrade_checkpoint ../.cache/torch/whisperx-vad-segmentation.bin Model was trained with pyannote.audio 0.0.1, yours is 3.1.1. Bad things might happen unless you revert pyannote.audio to 0.x. Model was trained with torch 1.10.0+cu102, yours is 2.1.2. Bad things might happen unless you revert torch to 1.x.