[Feature request] Implement OpenAI Whisper for TTS Option

leon-ai / leon

🧠 Leon is your open-source personal assistant.

https://getleon.ai

MIT License

15.41k stars 1.27k forks source link

[Feature request] Implement OpenAI Whisper for TTS Option #446

Open pablogranolabar opened 1 year ago

pablogranolabar commented 1 year ago

Feature Use Case

Implement OpenAI Whisper ASR for SOTA TTS and wakeword triggers.

Feature Proposal

OpenAI recently released Whisper, a SOTA ASR model. Recent development on Whisper include third party model implementations which support distilled model weights and reduced precision inference, sufficient to support Whisper on CPU platforms.

louistiti commented 1 year ago

Interesting, thanks. Added to this roadmap card and this one.

louistiti commented 1 year ago

✨ [1.0.0] Implement new offline STT

louistiti commented 1 year ago

🔧 [1.0.0] Implement new hotword solution

johannbarbie commented 1 year ago

from the model card: While Whisper models cannot be used for real-time transcription out of the box – their speed and size suggest that others may be able to build applications on top of them that allow for near-real-time speech recognition and translation.

currently whisper works on 30-second chunks of audio. I guess the leon responses would become very delayed.

louistiti commented 1 year ago

from the model card: While Whisper models cannot be used for real-time transcription out of the box – their speed and size suggest that others may be able to build applications on top of them that allow for near-real-time speech recognition and translation.

currently whisper works on 30-second chunks of audio. I guess the leon responses would become very delayed.

Thanks for pointing this out. I'll take a closer look once I'll be focusing on it.

pablogranolabar commented 1 year ago

Nah Whisper is configurable for whatever length inputs you specify, we have a Flutter port going now that is near realtime on mobile. The larger models on CPU should be realtime in performance.