rishikksh20 / HiFi-GAN

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
MIT License
81 stars 20 forks source link

Configuration settings #7

Closed DeepDubbed closed 3 years ago

DeepDubbed commented 3 years ago

Hey! I'll start by saying I really appreciate all the work you do in implementing papers that don't come with code!

I've noticed in your "config" that in general you use a 22khz sampling rate.
Your settings for mel-spectograms is limiting the mel_fmax to 8000.00 - that will be the equivalent of a 16khz sampling rate, you should consider using mel_fmax: 11025.0. It's also possible that having only 80 channels will limit a bit result quality - it might be worth experimenting with more channels.

I also noticed you've specified you are saying that you use 1 second segments - but the segment length is set to 16000 - that will be true for 16 khz sampling rate but not for 22 khz - you might want to change segment_length to 22050. I know that HiFi-Gan only uses 0.5 seconds and it gets really good results, I'm not sure what will be the quality impact of using less or more samples - for sure it will take longer to train.


Mihai Cvasnievschi

DeepDubbed commented 3 years ago

This was referring to VocGAN