miroslavpejic85 / mirotalk

🚀 WebRTC - P2P - Simple, Secure, Fast Real-Time Video Conferences Up to 8k and 60fps, compatible with all browsers and platforms.
https://p2p.mirotalk.com
GNU Affero General Public License v3.0
2.89k stars 546 forks source link

Use OpenAI's Whisper for the captioning system #234

Closed EntityinArray closed 2 months ago

EntityinArray commented 2 months ago

Feature request

I couldn't figure out what you use for captioning, but it often fails to recognize simple words. \ Is it possible to use OpenAI's Whisper? It's a lightweight, free and open-source speech-to-text AI model. I think it can run serverside on the CPU.

https://github.com/openai/whisper

It's mindblowingly good at recognizing speech, here's a demo. https://youtu.be/Ph6K_0ttsSc?t=869

Pros

miroslavpejic85 commented 2 months ago

I couldn't figure out what you use for captioning

We use SpeechRecognition

Is it possible to use OpenAI's Whisper?

Whisper is coded purely in Python, while MiroTalk primarily utilizes JavaScript.

Please next time use the ideas and suggestions channel on our mirotalk-forum.

Thank you!