reriiasu / speech-to-text

Real-time transcription using faster-whisper
MIT License
405 stars 62 forks source link
faster-whisper openai speech-recognition speech-to-text voice-recognition whisper

speech-to-text

Real-time transcription using faster-whisper

architecture

Accepts audio input from a microphone using a Sounddevice. By using Silero VAD(Voice Activity Detection), silent parts are detected and recognized as one voice data. This audio data is converted to text using Faster-Whisper.

The HTML-based GUI allows you to check the transcription results and make detailed settings for the faster-whisper.

Transcription speed

If the sentences are well separated, the transcription takes less than a second. TranscriptionSpeed

Large-v2 model
Executed with CUDA 11.7 on a NVIDIA GeForce RTX 3060 12GB.

Installation

  1. pip install .

for Windows

Please execute "run.bat." It will perform the following actions:

  1. Create a Python virtual environment.
  2. Install pip packages.
  3. Run speech_to_text.

Usage

  1. python -m speech_to_text
  2. Select "App Settings" and configure the settings.
  3. Select "Model Settings" and configure the settings.
  4. Select "Transcribe Settings" and configure the settings.
  5. Select "VAD Settings" and configure the settings.
  6. Start Transcription

If you use the OpenAI API for text proofreading, set OPENAI_API_KEY as an environment variable.

Notes

Demo

demo

News

2023-06-26

2023-06-29

2023-07-03

2023-07-05

2023-07-08

2023-07-09

2023-07-11

2023-07-12

2023-10-01

2023-11-27

2024-07-23

Todo