Open pablogranolabar opened 1 year ago
Interesting, thanks. Added to this roadmap card and this one.
from the model card:
While Whisper models cannot be used for real-time transcription out of the box – their speed and size suggest that others may be able to build applications on top of them that allow for near-real-time speech recognition and translation.
currently whisper works on 30-second chunks of audio. I guess the leon responses would become very delayed.
from the model card:
While Whisper models cannot be used for real-time transcription out of the box – their speed and size suggest that others may be able to build applications on top of them that allow for near-real-time speech recognition and translation.
currently whisper works on 30-second chunks of audio. I guess the leon responses would become very delayed.
Thanks for pointing this out. I'll take a closer look once I'll be focusing on it.
Nah Whisper is configurable for whatever length inputs you specify, we have a Flutter port going now that is near realtime on mobile. The larger models on CPU should be realtime in performance.
Feature Use Case
Implement OpenAI Whisper ASR for SOTA TTS and wakeword triggers.
Feature Proposal
OpenAI recently released Whisper, a SOTA ASR model. Recent development on Whisper include third party model implementations which support distilled model weights and reduced precision inference, sufficient to support Whisper on CPU platforms.