Closed oep42 closed 1 year ago
--language=en for english language see full language codes here https://github.com/openai/whisper/blob/248b6cb124225dd263bb9bd32d060b6517e067f8/whisper/tokenizer.py#L10
@DigilConfianz If only "en" is allowed, but not "English", then the current Help info of WhisperX should be updated.
whisperx --help
usage: whisperx [-h] [--model MODEL] [--model_dir MODEL_DIR] [--device DEVICE] [--batch_size BATCH_SIZE]
[--compute_type {float16,float32,int8}] [--output_dir OUTPUT_DIR]
[--output_format {all,srt,vtt,txt,tsv,json}] [--verbose VERBOSE] [--task {transcribe,translate}]
[--language {af,am,ar,as,az,ba,be,bg,bn,bo,br,bs,ca,cs,cy,da,de,el,en,es,et,eu,fa,fi,fo,fr,gl,gu,ha,haw,
he,hi,hr,ht,hu,hy,id,is,it,ja,jw,ka,kk,km,kn,ko,la,lb,ln,lo,lt,lv,mg,mi,mk,ml,mn,mr,ms,mt,my,ne,nl,nn,no,oc,pa,pl,ps,pt,
ro,ru,sa,sd,si,sk,sl,sn,so,sq,sr,su,sv,sw,ta,te,tg,th,tk,tl,tr,tt,uk,ur,uz,vi,yi,yo,zh,Afrikaans,Albanian,Amharic,Arabic
,Armenian,Assamese,Azerbaijani,Bashkir,Basque,Belarusian,Bengali,Bosnian,Breton,Bulgarian,Burmese,Castilian,Catalan,Chin
ese,Croatian,Czech,Danish,Dutch,English,...
In addition, if WhisperX would not support full language names, then WhisperX would be different from what is common among several Whisper variants — which doesn't seem like a good idea.
Surprisingly, WhisperX 3.1.1 doesn't recognize "English" as language code. (WhisperX 3.1.1 does recognize "en" as language code.)