SpeakLocal

A TTS [text-to-speech] extension for oobabooga text WebUI

silero_tts is great, but it seems to have a word limit, so I made SpeakLocal.

This extension uses pyttsx4 for speech generation and ffmpeg for audio conversio.
Pyttsx4 uses the native TTS abilities of the host machine (Linux, MacOS, Windows) so you shouldn't need to install anything else for this to work.
This extension re-encodes the locally generated .WAV file to an .MP3 and pre-pends a media player to the text output field.
- The .MP3 encoding is set to ~18kbps compression so the output file is roughly 1 kilobyte for each second of audio. It's set low to conserve bandwidth when using mobile data.

Fire up a command prompt | shell:

cd PATH_TO_text-generation-webui/extensions

Now clone this repo:

git clone https://github.com/ill13/SpeakLocal

You may have to do:

pip install -r requirements.txt

...If pytssx4 and ffmpeg-python are not installed.

Finally enable the extension in the session tab

If you get this error:

AttributeError: module 'ffmpeg' has no attribute 'input'

Open the command line virtual environment and enter the following:

pip uninstall ffmpeg
pip uninstall ffmpeg-python
pip install ffmpeg-python

On Windows 10, make sure ffmpeg.exe in in your path

Restart Ooba and you should be all set.

More audio options added.

Voice selection: An enumerated list of TTS voices that are installed on the host.
Speech rate: Speed up or slow down how fast the words are spoken
Bitrate: Ability to adjust sound quality. Beware, higher bitrate means more data used!

ill13 / SpeakLocal