Closed sciai-ai closed 3 years ago
Maybe this is because Random Gaussian noise is used as the input for PWG. If you want to fix the results, please set random seed.
I saw this line in synthesis, does it have any effect?
vocoder.remove_weight_norm()
I think it is not related.
Maybe this is because Random Gaussian noise is used as the input for PWG. If you want to fix the results, please set random seed.
Thanks for your quick reply, will the random seed also fix the droput in tacotron 2 model. Is it possible to fix state of PWG and not Taco2
Set random number as seed -> Taco2 -> set fixed seed -> PWG?
import time
import torch
from espnet_model_zoo.downloader import ModelDownloader
from espnet2.bin.tts_inference import Text2Speech
from parallel_wavegan.utils import download_pretrained_model
from parallel_wavegan.utils import load_model
d = ModelDownloader()
text2speech = Text2Speech(
**d.download_and_unpack(tag),
device="cuda",
# Only for Tacotron 2
threshold=0.5,
minlenratio=0.0,
maxlenratio=10.0,
use_att_constraint=False,
backward_window=1,
forward_window=3,
# Only for FastSpeech & FastSpeech2
speed_control_alpha=1.0,
)
text2speech.spc2wav = None # Disable griffin-lim
vocoder = load_model(download_pretrained_model(vocoder_tag)).to("cuda").eval()
vocoder.remove_weight_norm()
with torch.no_grad():
start = time.time()
wav, c, *_ = text2speech(x)
wav = vocoder.inference(c)
rtf = (time.time() - start) / (len(wav) / fs)
print(f"RTF = {rtf:5f}")
Looking at this code I am not sure which two
places I need to add the random and manual seeds?
with torch.no_grad():
start = time.time()
# here for taco2
wav, c, *_ = text2speech(x)
# here for pwg
wav = vocoder.inference(c)
rtf = (time.time() - start) / (len(wav) / fs)
print(f"RTF = {rtf:5f}")
Thank you @kan-bayashi. I am waiting to try some new vocoders you introduced as well :) Great job!
Hi, I have noticed that the loudness of the synthesized wavefrom varies for PWG. Is it possible tom make sure that the synthesised waveform has the same loudness?