jianfch / stable-ts

Transcription, forced alignment, and audio indexing with OpenAI's Whisper
MIT License
1.58k stars 176 forks source link

Does this support multiple languages? #77

Closed sammefford closed 1 year ago

sammefford commented 1 year ago

I'd like to use this for a project where multi-language support is important. I can't use whisper-timestamped because it's Affero GPL. But it points out some risks that concern me with WhisperX:

So I'm inclined to try your project. Thank you for the MIT license choice. Should I expect your project does not suffer from the limitations listed above for WhisperX?

Also, if you can, are there any pros / cons to using your project vs word_level_ts?

jianfch commented 1 year ago

Yes, it supports multi-language because it is merely extra logic that runs on top of an whisper instance. Since this project does not use other models, it does not suffer from the first two limitations. But it does lack of robustness around speech disfluencies. The word-level timestamps also lack robustness. But then I haven't evaluate it against other methods. So compared to the latter project you mentioned, a pro would be that stable-ts does not rely on another model whereas that project seems to use silero-vad from a quick glance.

bernardoforcillo commented 1 year ago

Does it support whisper translation and transcription? How can I set the output language?

jianfch commented 1 year ago

Yes, it supports translation and transcription. It accept all the arguments that whisper accepts. To specific transcription language: language='English' for python; --language English for cli