m-bain / whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
BSD 2-Clause "Simplified" License
12.45k stars 1.31k forks source link

transcribe not support language #241

Open zhanghongyu-git opened 1 year ago

zhanghongyu-git commented 1 year ago

transcribe function not support language argument , self.detect_language just detect pre 30 seconds , but some video pre 30 second is silent or music , the language detection is not correct , so i have myself function to detect language ,but transcribe not support language argument , looking forward next version support that, thanks

sorgfresser commented 1 year ago

There is the --language keyword in cli-usage. The transcribe function doesn't support a language specified, but load_model does which generates a pipeline with the exact language. I know it's not the same compared to passing it to transcribe and deleting the current self.tokenizer (which would be an option). Is the language keyword a feature that should be added @m-bain ?

m-bain commented 1 year ago

Yes I think probably .transcribe function should allow for many different inference kwargs (e.g. changing language)

phineas-pta commented 1 year ago

i think 3 often used args task, language, initial_prompt should be moved from load_model() to transcribe() for compatibility with original whisper

related: #191 #259

arnavmehta7 commented 1 year ago

@zhanghongyu-git Sorry for late response, but I was able to fix this by just specificying languge=None while loading the model and choosing the tokenizer accordingly in the code. Idr if this needs a PR but if it does, I might create it.