shirayu / whispering

Streaming transcriber with whisper
MIT License
686 stars 54 forks source link

Remove multi language feature (Revert #20) #23

Closed shirayu closed 1 year ago

shirayu commented 1 year ago

I read the whisper code and noticed that multilingual tokenizer is not supposed in Whisper.

When language is None, the tokenizer is not for all languages but for English (en) for "multilingual whisper models" (tiny, base, small, medium, large).

https://github.com/openai/whisper/blob/9e653bd0ea0f1e9493cb4939733e9de249493cfb/whisper/tokenizer.py#L295-L316

    if multilingual:
        tokenizer_name = "multilingual"
        task = task or "transcribe"
        language = language or "en"

Revert #20 Related to #21

AlexandraRamassamy commented 1 year ago

Hi @shirayu ,

I see you have reverted my PR to fix #23. I understand that the tokenizer does not support a multilanguage mode, however, multilanguage transcription works fine on my end. I think there is an error in the code you have edited, as language should be None and not opts.language . Thanks for looking into this :)