sh-lee-prml / BigVGAN

Unofficial pytorch implementation of BigVGAN: A Universal Neural Vocoder with Large-Scale Training
MIT License
130 stars 16 forks source link

Training on different sampling rates #7

Open skol101 opened 2 years ago

skol101 commented 2 years ago

Would be nice if you provided clear instructions on how this model can be trained on different sampling rates.

sh-lee-prml commented 2 years ago

There are so many ways...

First, check a preprocessing method for your Mel-spectrogram Second, change the initial frequency value for resampling

https://github.com/sh-lee-prml/BigVGAN/blob/main/models_bigvgan.py#L104

Calculate this according to your sampling, hop size, and upsampling rate

I'll show you an example when using hop size of 256 and upsampling rates =[8,8,2,2])

16000 sr --> initial_freq = [500, 4000, 8000, 16000] 22050sr --> initial_freq = [690, 5513, 11025, 22050] 24000sr --> initial_freq = [750, 6000, 12000, 24000]

skol101 commented 2 years ago

Cheers, I'll test it out.

980202006 commented 2 years ago

For different sampling rates, do you need to modify the parameters of MPD, similar to https://github.com/jik876/hifi-gan/issues/58.