m-bain / whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
BSD 2-Clause "Simplified" License
12.72k stars 1.35k forks source link

Trouble specifying an external language model (Swedish) #168

Open waterbottlebottle opened 1 year ago

waterbottlebottle commented 1 year ago

Hi, I was able to get WhisperX successfully working with French. Then I tried it with Swedish based on a model I saw online:

whisperx .\input.wav --model large --align_model viktor-enzell/wav2vec2-large-voxrex-swedish-4gram

It was working, the console output Swedish as it went the entire way. And then at very end it errored:

New language found (sv)! Previous was (en), loading new alignment model for new language...
There is no default alignment model set for this language (sv).                Please find a wav2vec2.0 model finetuned on this language in https://huggingface.co/models, then pass the model name in --align_model [MODEL_NAME]
Traceback (most recent call last):
  File "[...]\Python310\Scripts\whisperx-script.py", line 33, in <module>
    sys.exit(load_entry_point('whisperx==2.0', 'console_scripts', 'whisperx')())
  File "[...]\Python310\lib\site-packages\whisperx\transcribe.py", line 182, in cli
    align_model, align_metadata = load_align_model(result["language"], device)
  File "[...]\Python310\lib\site-packages\whisperx\alignment.py", line 53, in load_align_model
    raise ValueError(f"No default align-model for language: {language_code}")
ValueError: No default align-model for language: sv

And then didn't spit out the .srt, even though from the console I can see it did successfully generate all the Swedish along the way!

Any idea what I'm doing wrong? (Thanks in advance, sorry if this is a silly question!)

waterbottlebottle commented 1 year ago

Update: ah ok trying more with a smaller file I see I need to add --language sv

I guess I'm still slightly surprised since it did everything even without the flag and only choked at the very end, but at least now I know!