mozilla / TTS

:robot: :speech_balloon: Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)
Mozilla Public License 2.0
9.43k stars 1.26k forks source link

RuntimeError: failed to find espeak library #742

Closed Jzow closed 2 years ago

Jzow commented 2 years ago

import os
import torch
import time

from TTS.tts.utils.generic_utils import setup_model
from TTS.utils.io import load_config
from TTS.tts.utils.text.symbols import symbols, phonemes
from TTS.utils.audio import AudioProcessor
from TTS.tts.utils.synthesis import synthesis
from TTS.vocoder.utils.generic_utils import setup_generator

def tts(model, text, CONFIG, use_cuda, ap, use_gl, figures=True):
    t_1 = time.time()
    waveform, alignment, mel_spec, mel_postnet_spec, stop_tokens, inputs = synthesis(model, text, CONFIG, use_cuda, ap, speaker_id, style_wav=None,
                                                                             truncated=False, enable_eos_bos_chars=CONFIG.enable_eos_bos_chars)
    # mel_postnet_spec = ap.denormalize(mel_postnet_spec.T)
    if not use_gl:
        waveform = vocoder_model.inference(torch.FloatTensor(mel_postnet_spec.T).unsqueeze(0))
        waveform = waveform.flatten()
    if use_cuda:
        waveform = waveform.cpu()
    waveform = waveform.numpy()
    rtf = (time.time() - t_1) / (len(waveform) / ap.sample_rate)
    tps = (time.time() - t_1) / len(waveform)
    print(waveform.shape)
    print(" > Run-time: {}".format(time.time() - t_1))
    print(" > Real-time factor: {}".format(rtf))
    print(" > Time per step: {}".format(tps))
    return alignment, mel_postnet_spec, stop_tokens, waveform

use_cuda = False

TTS_MODEL = "models/textTovideo/tts_model.pth.tar"
TTS_CONFIG = "models/textTovideo/config.json"
VOCODER_MODEL = "models/textTovideo/vocoder_model.pth.tar"
VOCODER_CONFIG = "models/textTovideo/config_vocoder.json"

TTS_CONFIG = load_config(TTS_CONFIG)
VOCODER_CONFIG = load_config(VOCODER_CONFIG)

TTS_CONFIG.audio['stats_path'] = 'models/textTovideo/scale_stats.npy'
ap = AudioProcessor(**TTS_CONFIG.audio)

speaker_id = None
speakers = []

# load the model
num_chars = len(phonemes) if TTS_CONFIG.use_phonemes else len(symbols)
model = setup_model(num_chars, len(speakers), TTS_CONFIG)

# load model state
cp =  torch.load(TTS_MODEL, map_location=torch.device('cpu'))

# load the model
model.load_state_dict(cp['model'])
if use_cuda:
    model.cuda()
model.eval()

# set model stepsize
if 'r' in cp:
    model.decoder.set_r(cp['r'])

# LOAD VOCODER MODEL
vocoder_model = setup_generator(VOCODER_CONFIG)
vocoder_model.load_state_dict(torch.load(VOCODER_MODEL, map_location="cpu")["model"])
vocoder_model.remove_weight_norm()
vocoder_model.inference_padding = 0

ap_vocoder = AudioProcessor(**VOCODER_CONFIG['audio'])
if use_cuda:
    vocoder_model.cuda()
vocoder_model.eval()

sentence =  "Bill got in the habit of asking himself “Is that thought true?” and if he wasn’t absolutely certain it was, he just let it go."
align, spec, stop_tokens, wav = tts(model, sentence, TTS_CONFIG, use_cuda, ap, use_gl=False, figures=True)

An error is reported when I run the demo according to the document. I don't know the key 'gst_embedding_dim: null' where is set

Jzow commented 2 years ago

736 This issue is also the problem

Jzow commented 2 years ago

I have solved the above problem. I want to know how Windows does reasoning. I have downloaded espeak

new error:

C:\Users\Administrator\AppData\Local\Programs\Python\Python38\python.exe E:/iston_algorithm/util/speech/TextTest.py
2022-01-14 23:56:56.824717: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'cudart64_101.dll'; dlerror: cudart64_101.dll not found
2022-01-14 23:56:56.824831: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
 > Setting up Audio Processor...
 | > sample_rate:22050
 | > resample:False
 | > num_mels:80
 | > min_level_db:-100
 | > frame_shift_ms:None
 | > frame_length_ms:None
 | > ref_level_db:0
 | > fft_size:1024
 | > power:1.5
 | > preemphasis:0.0
 | > griffin_lim_iters:60
 | > signal_norm:True
 | > symmetric_norm:True
 | > mel_fmin:50.0
 | > mel_fmax:7600.0
 | > spec_gain:1.0
 | > stft_pad_mode:reflect
 | > max_norm:4.0
 | > clip_norm:True
 | > do_trim_silence:True
 | > trim_db:60
 | > do_sound_norm:False
 | > stats_path:models/textTovideo/scale_stats.npy
 | > hop_length:256
 | > win_length:1024
 > Using model: Tacotron2
 > Generator Model: multiband_melgan_generator
 > Setting up Audio Processor...
 | > sample_rate:22050
 | > resample:False
 | > num_mels:80
 | > min_level_db:-100
 | > frame_shift_ms:None
 | > frame_length_ms:None
 | > ref_level_db:0
 | > fft_size:1024
 | > power:None
 | > preemphasis:0.0
 | > griffin_lim_iters:None
 | > signal_norm:True
 | > symmetric_norm:True
 | > mel_fmin:50.0
 | > mel_fmax:7600.0
 | > spec_gain:1.0
 | > stft_pad_mode:reflect
 | > max_norm:4.0
 | > clip_norm:True
 | > do_trim_silence:True
 | > trim_db:60
 | > do_sound_norm:False
 | > stats_path:E:/iston_algorithm/util/speech/models/textTovideo/scale_stats.npy
 | > hop_length:256
 | > win_length:1024
Traceback (most recent call last):
  File "E:/iston_algorithm/util/speech/TextTest.py", line 80, in <module>
    align, spec, stop_tokens, wav = tts(model, sentence, TTS_CONFIG, use_cuda, ap, use_gl=False, figures=True)
  File "E:/iston_algorithm/util/speech/TextTest.py", line 13, in tts
    waveform, alignment, mel_spec, mel_postnet_spec, stop_tokens, inputs = synthesis(model, text, CONFIG, use_cuda, ap, speaker_id, style_wav=None,
  File "e:\tts\TTS\tts\utils\synthesis.py", line 235, in synthesis
    inputs = text_to_seqvec(text, CONFIG)
  File "e:\tts\TTS\tts\utils\synthesis.py", line 15, in text_to_seqvec
    phoneme_to_sequence(text, text_cleaner, CONFIG.phoneme_language,
  File "e:\tts\TTS\tts\utils\text\__init__.py", line 87, in phoneme_to_sequence
    to_phonemes = text2phone(clean_text, language)
  File "e:\tts\TTS\tts\utils\text\__init__.py", line 50, in text2phone
    ph = phonemize(text, separator=seperator, strip=False, njobs=1, backend='espeak', language=language, preserve_punctuation=True, language_switch='remove-flags')
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python38\lib\site-packages\phonemizer\phonemize.py", line 175, in phonemize
    phonemizer = BACKENDS[backend](
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python38\lib\site-packages\phonemizer\backend\espeak\espeak.py", line 44, in __init__
    super().__init__(
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python38\lib\site-packages\phonemizer\backend\espeak\base.py", line 36, in __init__
    self._espeak = EspeakWrapper()
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python38\lib\site-packages\phonemizer\backend\espeak\wrapper.py", line 59, in __init__
    self._espeak = EspeakAPI(self.library())
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python38\lib\site-packages\phonemizer\backend\espeak\wrapper.py", line 149, in library
    raise RuntimeError(
RuntimeError: failed to find espeak library

Process finished with exit code 1
Jzow commented 2 years ago

When I refer to other issues, I find that I set the environment variable but still report an error. Can anyone help me? Thank you very much

environment variable: PHONEMIZER_ESPEAK_LIBRARY : C:\Program Files\eSpeak NG\libespeak-ng.dll PHONEMIZER_ESPEAK_PATH : C:\Program Files (x86)\eSpeak\command_line\espeak.exe Path : C:\Program Files (x86)\eSpeak\command_line

I have installed both espeak and espeak ng, and refer to the following issue settings, but the program still runs with an error

735 , #44 , #148

error:

Traceback (most recent call last):
  File "E:/iston_algorithm/util/speech/TextTest.py", line 80, in <module>
    align, spec, stop_tokens, wav = tts(model, sentence, TTS_CONFIG, use_cuda, ap, use_gl=False, figures=True)
  File "E:/iston_algorithm/util/speech/TextTest.py", line 13, in tts
    waveform, alignment, mel_spec, mel_postnet_spec, stop_tokens, inputs = synthesis(model, text, CONFIG, use_cuda, ap, speaker_id, style_wav=None,
  File "e:\tts\TTS\tts\utils\synthesis.py", line 235, in synthesis
    inputs = text_to_seqvec(text, CONFIG)
  File "e:\tts\TTS\tts\utils\synthesis.py", line 15, in text_to_seqvec
    phoneme_to_sequence(text, text_cleaner, CONFIG.phoneme_language,
  File "e:\tts\TTS\tts\utils\text\__init__.py", line 87, in phoneme_to_sequence
    to_phonemes = text2phone(clean_text, language)
  File "e:\tts\TTS\tts\utils\text\__init__.py", line 50, in text2phone
    ph = phonemize(text, separator=seperator, strip=False, njobs=1, backend='espeak', language=language, preserve_punctuation=True, language_switch='remove-flags')
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python38\lib\site-packages\phonemizer\phonemize.py", line 175, in phonemize
    phonemizer = BACKENDS[backend](
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python38\lib\site-packages\phonemizer\backend\espeak\espeak.py", line 44, in __init__
    super().__init__(
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python38\lib\site-packages\phonemizer\backend\espeak\base.py", line 36, in __init__
    self._espeak = EspeakWrapper()
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python38\lib\site-packages\phonemizer\backend\espeak\wrapper.py", line 59, in __init__
    self._espeak = EspeakAPI(self.library())
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python38\lib\site-packages\phonemizer\backend\espeak\wrapper.py", line 149, in library
    raise RuntimeError(
RuntimeError: failed to find espeak library

Process finished with exit code 1
Jzow commented 2 years ago

44

Jzow commented 2 years ago

I have successfully solved this problem in win11 environment. It needs to use espeak ng instead of espeak. My environment variables are as follows:

PHONEMIZER_ESPEAK_LIBRARY:C:\Program Files\eSpeak NG\libespeak-ng.dll PHONEMIZER_ESPEAK_PATH:C:\Program Files\eSpeak NG\espeak-ng.exe

But I found that I need to restart the computer to make the environment variables take effect. I don't know whether this is a problem of computer principle or OS For the problem of environ loading, I remember that when configuring Java environment variables, the configuration will take effect immediately. Why set this PHONEMIZER_ESPEAK_PATH:C:\Program Files\eSpeak NG\espeak-ng.exe cannot take effect immediately.

I hope to get help. Thank for providing TTS open source library

nmstoker commented 2 years ago

Hi @Jzow - I think you must have not noticed but this repo isn't being maintained. As you'll see, there have been no commits in 12 months. Best to look at https://github.com/coqui-ai/TTS

Jzow commented 2 years ago

@nmstoker thanks

Jzow commented 2 years ago

@nmstoker hi I wonder why the maintenance of Mozilla TTS has been stopped

nmstoker commented 2 years ago

There were significant layoffs at Mozilla back in August 2020