savbell / whisper-writer

πŸ’¬πŸ“ A small dictation app using OpenAI's Whisper speech recognition model.
GNU General Public License v3.0
244 stars 40 forks source link
dictation faster-whisper openai openai-api openai-whisper speech-recognition speech-to-text typing-assistant whisper

WhisperWriter icon WhisperWriter

version

WhisperWriter demo gif

Update (2024-05-28): I've just merged in a major rewrite of WhisperWriter! We've migrated from using tkinter to using PyQt5 for the UI, added a new settings window for configuration, a new continuous recording mode, support for a local API, and more! Please be patient as I work out any bugs that may have been introduced in the process. If you encounter any problems, please open a new issue!

WhisperWriter is a small speech-to-text app that uses OpenAI's Whisper model to auto-transcribe recordings from a user's microphone to the active window.

Once started, the script runs in the background and waits for a keyboard shortcut to be pressed (ctrl+shift+space by default). When the shortcut is pressed, the app starts recording from your microphone. There are four recording modes to choose from:

You can change the keyboard shortcut (activation_key) and recording mode in the Configuration Options. While recording and transcribing, a small status window is displayed that shows the current stage of the process (but this can be turned off). Once the transcription is complete, the transcribed text will be automatically written to the active window.

The transcription can either be done locally through the faster-whisper Python package or through a request to OpenAI's API. By default, the app will use a local model, but you can change this in the Configuration Options. If you choose to use the API, you will need to either provide your OpenAI API key or change the base URL endpoint.

Fun fact: Almost the entirety of the initial release of the project was pair-programmed with ChatGPT-4 and GitHub Copilot using VS Code. Practically every line, including most of this README, was written by AI. After the initial prototype was finished, WhisperWriter was used to write a lot of the prompts as well!

Getting Started

Prerequisites

Before you can run this app, you'll need to have the following software installed:

If you want to run faster-whisper on your GPU, you'll also need to install the following NVIDIA libraries:

More information on GPU execution The below was taken directly from the [`faster-whisper` README](https://github.com/SYSTRAN/faster-whisper?tab=readme-ov-file#gpu): **Note:** The latest versions of `ctranslate2` support CUDA 12 only. For CUDA 11, the current workaround is downgrading to the `3.24.0` version of `ctranslate2` (This can be done with `pip install --force-reinsall ctranslate2==3.24.0`). There are multiple ways to install the NVIDIA libraries mentioned above. The recommended way is described in the official NVIDIA documentation, but we also suggest other installation methods below. #### Use Docker The libraries (cuBLAS, cuDNN) are installed in these official NVIDIA CUDA Docker images: `nvidia/cuda:12.0.0-runtime-ubuntu20.04` or `nvidia/cuda:12.0.0-runtime-ubuntu22.04`. #### Install with `pip` (Linux only) On Linux these libraries can be installed with `pip`. Note that `LD_LIBRARY_PATH` must be set before launching Python. ```bash pip install nvidia-cublas-cu12 nvidia-cudnn-cu12 export LD_LIBRARY_PATH=`python3 -c 'import os; import nvidia.cublas.lib; import nvidia.cudnn.lib; print(os.path.dirname(nvidia.cublas.lib.__file__) + ":" + os.path.dirname(nvidia.cudnn.lib.__file__))'` ``` **Note**: Version 9+ of `nvidia-cudnn-cu12` appears to cause issues due its reliance on cuDNN 9 (Faster-Whisper does not currently support cuDNN 9). Ensure your version of the Python package is for cuDNN 8. #### Download the libraries from Purfview's repository (Windows & Linux) Purfview's [whisper-standalone-win](https://github.com/Purfview/whisper-standalone-win) provides the required NVIDIA libraries for Windows & Linux in a [single archive](https://github.com/Purfview/whisper-standalone-win/releases/tag/libs). Decompress the archive and place the libraries in a directory included in the `PATH`.

Installation

To set up and run the project, follow these steps:

1. Clone the repository:

git clone https://github.com/savbell/whisper-writer
cd whisper-writer

2. Create a virtual environment and activate it:

python -m venv venv

# For Linux and macOS:
source venv/bin/activate

# For Windows:
venv\Scripts\activate

3. Install the required packages:

pip install -r requirements.txt

4. Run the Python code:

python run.py

5. Configure and start WhisperWriter:

On first run, a Settings window should appear. Once configured and saved, another window will open. Press "Start" to activate the keyboard listener. Press the activation key (ctrl+shift+space by default) to start recording and transcribing to the active window.

Configuration Options

WhisperWriter uses a configuration file to customize its behaviour. To set up the configuration, open the Settings window:

WhisperWriter Settings window demo gif

Model Options

Recording Options

Post-processing Options

Miscellaneous Options

If any of the configuration options are invalid or not provided, the program will use the default values.

Known Issues

You can see all reported issues and their current status in our Issue Tracker. If you encounter a problem, please open a new issue with a detailed description and reproduction steps, if possible.

Roadmap

Below are features I am planning to add in the near future:

Below are features not currently planned:

Implemented features can be found in the CHANGELOG.

Contributing

Contributions are welcome! I created this project for my own personal use and didn't expect it to get much attention, so I haven't put much effort into testing or making it easy for others to contribute. If you have ideas or suggestions, feel free to open a pull request or create a new issue. I'll do my best to review and respond as time allows.

Credits

License

This project is licensed under the GNU General Public License. See the LICENSE file for details.