Open Adliyan opened 3 years ago
I encountered the same problem. But I didnt retrain the hifigan so I thought it was the reason. Now I just resample the wav file to 22050 and retrain. Do you change all the steps that need the sampling rate? Like preprocess.py line 172 reading wav file, it didnt have sampling rate parameter in original code.
Changing that single line in preprocessor/preprocessor.py
fixed this issue for me, training with 16 kHz audio. Thanks for the pointer!
@dan-wells where did you change with sampling rate?Can you please share the code?And can we use wav file with different sampling rate in dataset for this model?
@azman63
i suppose, from this: wav, _ = librosa.load(wavpath) to this: wav, = librosa.load(wav_path,sr=16000)
Changing that single line in
preprocessor/preprocessor.py
fixed this issue for me, training with 16 kHz audio. Thanks for the pointer!
thank you so much! I completely forgot that librosa loads audio as 22050Hz by default
I modified the sampling rate parameter of preprocess.yaml to 16k, and retrained fastspeech2 and hifigan with data with a sampling rate of 16k, but the resultant synthesized speech was very strange. Using 16k to play the synthesized speech was very slow. Instead, I used 22k. The playback is a lot normal, just like the synthesized voice is still at a sampling rate of 22k. So I want to ask if there are other parameters in the model that affect the sampling rate of the model's synthesized speech.