Closed sammefford closed 1 year ago
Yes, it supports multi-language because it is merely extra logic that runs on top of an whisper instance. Since this project does not use other models, it does not suffer from the first two limitations. But it does lack of robustness around speech disfluencies. The word-level timestamps also lack robustness. But then I haven't evaluate it against other methods. So compared to the latter project you mentioned, a pro would be that stable-ts does not rely on another model whereas that project seems to use silero-vad from a quick glance.
Does it support whisper translation and transcription? How can I set the output language?
Yes, it supports translation and transcription. It accept all the arguments that whisper accepts.
To specific transcription language: language='English'
for python; --language English
for cli
I'd like to use this for a project where multi-language support is important. I can't use whisper-timestamped because it's Affero GPL. But it points out some risks that concern me with WhisperX:
So I'm inclined to try your project. Thank you for the MIT license choice. Should I expect your project does not suffer from the limitations listed above for WhisperX?
Also, if you can, are there any pros / cons to using your project vs word_level_ts?