saharmor / whisper-playground

Build real time speech2text web apps using OpenAI's Whisper https://openai.com/blog/whisper/
MIT License
777 stars 140 forks source link
machine-learning openai speech-recognition speech-to-text whisper
giant microphone

Whisper Playground

Instantly build real-time speech2text apps in 99 languages using faster-whisper, Diart, and Pyannote
Try it via the online demo

visitors

https://github.com/ethanzrd/whisper-playground/assets/79014814/44a9bcf0-e374-4c71-8189-1d99824fbdc5

Setup

  1. Have Conda and Yarn on your device
  2. Clone or fork this repository
  3. Install the backend and frontend environment sh install_playground.sh
  4. Review config.py to make sure the transcription device and compute type match your setup. Review config.js to make sure it conforms to the backend config and that the backend address is correct.
  5. Run the backend cd backend && python server.py
  6. In a different terminal, run the React frontend cd interface && yarn start

Access to Pyannote Models

This repository uses libraries based on pyannote.audio models, which are stored in the Hugging Face Hub. You must accept their terms of use before using them. Note: You need to have a Hugging Face account to use pyannote

  1. Accept terms for the pyannote/segmentation model
  2. Accept terms for the pyannote/embedding model
  3. Accept terms for the pyannote/speaker-diarization model
  4. Install huggingface-cli and log in with your user access token (can be found in Settings -> Access Tokens)

Parameters

Troubleshooting

Known Bugs

  1. In the sequential mode, there may be uncontrolled speaker swapping.
  2. In real-time mode, audio data not meeting the transcription timeout won't be transcribed.

This repository hasn't been tested for all languages; please create an issue if you encounter any problems.

License

This repository and the code and model weights of Whisper are released under the MIT License.