jik876 / hifi-gan

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
MIT License
1.95k stars 504 forks source link

Pretrained Hifi GAN vocoder at 16KHz #125

Open narendranp opened 2 years ago

narendranp commented 2 years ago

Hi, Does any know where the pretrained Hifi GAN vocoder that works at 16KHz is available. OR Can any one have config file (hyper parameters setting at 16K Hz) that gives the best possible quality at 16 KHz. I am not able to get the right parameter settings for 16KHz vocoder. It would be very help full for me.

Thanks in advance, Narendra

linlinsongyun commented 1 year ago
"upsample_rates": [2,5,4,4],
"upsample_kernel_sizes": [16,15,4,4],
"upsample_initial_channel": 512,
"resblock_kernel_sizes": [3,7,11],
"resblock_dilation_sizes": [[1,3,5], [1,3,5], [1,3,5]],
"resblock_initial_channel": 256,

"segment_size": 5120,
"num_mels": 80, 
"num_freq": 512,
"n_fft": 512,
"hop_size": 160,
"win_size": 512,

"sampling_rate": 16000,

i use these parameters training at 16KHz.

lixinghe1999 commented 1 year ago
"upsample_rates": [2,5,4,4],
"upsample_kernel_sizes": [16,15,4,4],
"upsample_initial_channel": 512,
"resblock_kernel_sizes": [3,7,11],
"resblock_dilation_sizes": [[1,3,5], [1,3,5], [1,3,5]],
"resblock_initial_channel": 256,

"segment_size": 5120,
"num_mels": 80, 
"num_freq": 512,
"n_fft": 512,
"hop_size": 160,
"win_size": 512,

"sampling_rate": 16000,

i use these parameters training at 16KHz.

would you mind sharing the trained checkpoint? thank you in advance!

a897456 commented 1 year ago
"upsample_rates": [2,5,4,4],
"upsample_kernel_sizes": [16,15,4,4],
"upsample_initial_channel": 512,
"resblock_kernel_sizes": [3,7,11],
"resblock_dilation_sizes": [[1,3,5], [1,3,5], [1,3,5]],
"resblock_initial_channel": 256,

"segment_size": 5120,
"num_mels": 80, 
"num_freq": 512,
"n_fft": 512,
"hop_size": 160,
"win_size": 512,

"sampling_rate": 16000,

i use these parameters training at 16KHz.

"upsample_kernel_sizes": [16,15,4,4]? Is the 15/2 the stride of the second De_Conv ?

PussyCat0700 commented 11 months ago

Mind if I ask why you set n_fft to 512 instead of 4 times of hop_size(4*160=640), which is a usually default setting in STFT?