Closed Wangzhen-kris closed 3 years ago
I found the ref_level_db
in audio
is a key parameter by extra experiments, I should set the same value in the two config files. At present, the new training process is in progress, and I will continue to update the latest results here.
I found the
ref_level_db
inaudio
is a key parameter by extra experiments, I should set the same value in the two config files. At present, the new training process is in progress, and I will continue to update the latest results here.
Disappointingly, I retrained wavegrad and it still doesn't work until now.
could you share screens of your tensorboard logging?
could you share screens of your tensorboard logging?
@SanjaESC Sure, I will share the log files of tacotron2 and wavegrad, I also share screens to simplify matters for you.
Here the log file of Tacotron2: https://drive.google.com/file/d/1Ex6voGEqEcDFt5Wt7wSR1H_gEyF4edXu/view?usp=sharing
Here the log file of WaveGrad: https://drive.google.com/file/d/12jwQy6ngKtrbA3pdwKLWCmFfmpqrtU_a/view?usp=sharing
Here the screens of Tacotron2:
Here the screens of WaveGrad:
Looking at the Train and Eval figures in the TTS training the energy level of the frequency seems weird (to high?). Maybe because it was trained with spec_gain: 1, but not quite sure here.
Other than that, the voice samples sound fine. How do you synthesize with WaveGrad? Maybe that's where the problem lies.
Looking at the Train and Eval figures in the TTS training the energy level of the frequency seems weird (to high?). Maybe because it was trained with spec_gain: 1, but not quite sure here.
Other than that, the voice samples sound fine. How do you synthesize with WaveGrad? Maybe that's where the problem lies.
Thank you for your prompt reply. Next I will focus on spec_gain, and it's weird that wavegrad can also synthesize effective voice from gt mel. Maybe I should train another vocoder with the same configuration in future, and then, I will continue to update the latest results here.
Have you tryed retraining both models with scale_stats.npy computed by TTS.bin.compute_statistics? They say also not to use it with multi speaker models, but none of my vocoders worked without the shared stats file and I got the same results as you.
Have you tryed retraining both models with scale_stats.npy computed by TTS.bin.compute_statistics? They say also not to use it with multi speaker models, but none of my vocoders worked without the shared stats file and I got the same results as you.
@albluc24 As you said, I trained multi speaker model, so I didn't use scale_stats.npy. Have you not solved this problem yet?
I did train my multi speaker model with scalestats, and taht's the only way I got it working
Next I will focus on spec_gain
for me 20 worked quite well.
Regarding wavegrad, how do you synthesize the speech? Maybe there is an error and your model is okay.
Next I will focus on spec_gain
for me 20 worked quite well.
Regarding wavegrad, how do you synthesize the speech? Maybe there is an error and your model is okay.
You are right, I got it wrong. I reconfirmed and found there is an error in wavegrad here. I referred to #518 , although it can run without error, the synthesized speech is invalid. Consequently, I'll try to train another vocoder with reasonable spec_gain.
I trained parallel-wavegan and it‘s able to adapt to the TTS model, so the reason for this issue is wavegrad.
@SanjaESC Hi, I tried to fix the bug of wavegrad to synthesize voice, but it didn't work. Is there anyone else making this effort? And I can't find an active issue with this error.
What error exactly do you mean? Are you executing it correctly? What are the steps? It works fine for me.
What error exactly do you mean? Are you executing it correctly? What are the steps? It works fine for me.
The error is that I can't synthesize effective voice with wavegrad. After getting your reply, I compared my files with the latest master branch and found the difference in synthesize.py
.
Here is mine, but I can't remember why I did it.
if not use_gl:
# Use if not computed noise schedule with tune_wavegrad
beta = np.linspace(1e-6, 0.01, 50)
vocoder_model.compute_noise_level(beta)
device_type = "cuda" if use_cuda else "cpu"
# Here is the reason for the error!!!
vocoder_input = ap._normalize(mel_postnet_spec.T)
waveform = vocoder_model.inference(torch.FloatTensor(vocoder_input).to(device_type).unsqueeze(0))
After the fix, it worked.
So it works now?
So it works now?
Yes! Thank you @SanjaESC , i'd close the issue soon.
Hello, I tried to train taco2 and vocoder separately and I can't synthesize effective voices with the pre-trained vocoder. I'm very strange that use griffin lim, the synthesized voice seems to be pretty good. Here synthesized voice with WaveGrad and GL: https://drive.google.com/file/d/1fAf7SGdnWfODQsCgVYBFd2l1g0ZWy1fA/view?usp=sharing https://drive.google.com/file/d/1AVXXRXarlZQ2pmYmg-pI-oUNu9Nx2IHP/view?usp=sharing
Here config.json of Taco2: ` { "model": "Tacotron2", "run_name": "ayq", "run_description": "tacotron2 with DDC and differential spectral loss.",
} `
Here vocoder_config.json of WaveGrad: ` { "run_name": "wavegrad-aqy", "run_description": "wavegrad aqy",
} `