I like the voice this model creates, however I can't get it to output fluid speech that is longer than 15 seconds. When it does, it starts to get garbled and loses fidelity. Could be OOM, but not getting a kill or system hang.
Using this code by the way;
from nix.models.TTS import NixTTSInference
# from IPython.display import Audio
import soundfile as sf
import wave
import numpy as np
# Initiate Nix-TTS
nix = NixTTSInference(model_dir="/docker/nix-tts/")
# Load the prompt.txt file from the local directory; this file contains the text to be spoken by the model
with open('/docker/nix-tts/prompt.txt', 'r') as file:
prompt_text = file.read()
# Tokenize input text
c, c_length, phoneme = nix.tokenize(prompt_text)
# Convert text to raw speech
xw = nix.vocalize(c, c_length)
# Listen to the generated speech
# Audio(xw[0, 0], rate=22050)
with wave.open('output.wav', 'wb') as wav_file:
wav_file.setnchannels(1)
wav_file.setsampwidth(2)
wav_file.setframerate(22050)
wav_file.writeframes((2 ** 15 * xw).astype(np.int16).tobytes())
Tempted to create 15 second audio files in order of how I want to recreate them and have it generate the entire script I'm working on for a video, I enjoy the pronunciation of lots of words when typed correctly.
I like the voice this model creates, however I can't get it to output fluid speech that is longer than 15 seconds. When it does, it starts to get garbled and loses fidelity. Could be OOM, but not getting a kill or system hang.
Using this code by the way;
Tempted to create 15 second audio files in order of how I want to recreate them and have it generate the entire script I'm working on for a video, I enjoy the pronunciation of lots of words when typed correctly.