Does this support multiple languages?

sammefford commented 1 year ago

I'd like to use this for a project where multi-language support is important. I can't use whisper-timestamped because it's Affero GPL. But it points out some risks that concern me with WhisperX:

The need to find one wav2vec model per language to support.
The need to normalize characters in whisper transcription to match the character set of wav2vec model. This involves awkward language-dependent conversions, like converting numbers to words ("2" -> "two"), symbols to words ("%" -> "percent", "€" -> "euro(s)")...
The lack of robustness around speech disfluencies (fillers, hesitations, repeated words...) that are usually removed by Whisper.

So I'm inclined to try your project. Thank you for the MIT license choice. Should I expect your project does not suffer from the limitations listed above for WhisperX?

Also, if you can, are there any pros / cons to using your project vs word_level_ts?

jianfch commented 1 year ago

Yes, it supports multi-language because it is merely extra logic that runs on top of an whisper instance. Since this project does not use other models, it does not suffer from the first two limitations. But it does lack of robustness around speech disfluencies. The word-level timestamps also lack robustness. But then I haven't evaluate it against other methods. So compared to the latter project you mentioned, a pro would be that stable-ts does not rely on another model whereas that project seems to use silero-vad from a quick glance.

bernardoforcillo commented 1 year ago

Does it support whisper translation and transcription? How can I set the output language?

jianfch commented 1 year ago

Yes, it supports translation and transcription. It accept all the arguments that whisper accepts. To specific transcription language: language='English' for python; --language English for cli

jianfch / stable-ts

Does this support multiple languages? #77