Open zhanghongyu-git opened 1 year ago
There is the --language
keyword in cli-usage. The transcribe
function doesn't support a language specified, but load_model
does which generates a pipeline with the exact language. I know it's not the same compared to passing it to transcribe and deleting the current self.tokenizer
(which would be an option). Is the language
keyword a feature that should be added @m-bain ?
Yes I think probably .transcribe
function should allow for many different inference kwargs (e.g. changing language)
i think 3 often used args task
, language
, initial_prompt
should be moved from load_model()
to transcribe()
for compatibility with original whisper
related: #191 #259
@zhanghongyu-git Sorry for late response, but I was able to fix this by just specificying languge=None while loading the model and choosing the tokenizer accordingly in the code. Idr if this needs a PR but if it does, I might create it.
transcribe function not support language argument , self.detect_language just detect pre 30 seconds , but some video pre 30 second is silent or music , the language detection is not correct , so i have myself function to detect language ,but transcribe not support language argument , looking forward next version support that, thanks