r9y9 / wavenet_vocoder

WaveNet vocoder
https://r9y9.github.io/wavenet_vocoder/
Other
2.3k stars 500 forks source link

How to training a WaveNet model with different sample rate? #221

Open lanberGB opened 1 year ago

lanberGB commented 1 year ago

Hi everyone, I am trying to train a MoL WaveNet model with different dataset, My dataset sampling rate is 48kHz, so I changed the sample rate parameter in .json file as 48kHz, then raise the fft_size and the win_length = 4096. Others are consistent with the default parameters. Here is the problem, after 100000 steps the model genertate two wave file named step000100000_predicted.wav and step000100000_target.wav (sample rate 48kHz, 16 bits), I played it and I found that the speech speed of the two wavs was much faster than the dataset, and I couldn't hear what was said at all. It sounds like the sample rate used when playing the wav file is inconsistent with the sampling rate of the wav data generated by the model itself, so I change the parameter sample rate to the default value (22.05 kHz), other parameters unchanged. after 200000 steps generate two wave file step000200000_predicted.wav and step000200000_target.wav, this time everything works fine and sounds exactly like the dataset. Now I want know why dose this happend? What did i do wrong or I just missed something? Because I plan to use it with a tacotron2 model (trained with the same dataset, also ste sample rate by 48kHz), will this cause the two models to not work properly?