jianfch / stable-ts

Transcription, forced alignment, and audio indexing with OpenAI's Whisper
MIT License
1.59k stars 176 forks source link

align() fails with faster whisper & chinese #289

Closed George0828Zhang closed 10 months ago

George0828Zhang commented 10 months ago

When calling .align() on faster-whisper model, if the language is chinese (or other non-whitespace languages), the assertion at https://github.com/jianfch/stable-ts/blob/a6b2b05568e75b1602a6e23891b59c4a9e218f6b/stable_whisper/alignment.py#L254 fails. This is because faster whisper's tokenizer.language is an id (50260 for zh), which in turn fails in _split_words: https://github.com/jianfch/stable-ts/blob/a6b2b05568e75b1602a6e23891b59c4a9e218f6b/stable_whisper/timing.py#L111 It should be tokenizer.language_code for faster-whisper models.

jianfch commented 10 months ago

Thanks for reporting this issue. It should be fixed in 677f233bedff857b56bfe48e10d095c40d7f6425.