Hey! I'll start by saying I really appreciate all the work you do in implementing papers that don't come with code!
I've noticed in your "config" that in general you use a 22khz sampling rate.
Your settings for mel-spectograms is limiting the mel_fmax to 8000.00 - that will be the equivalent of a 16khz sampling rate, you should consider using mel_fmax: 11025.0. It's also possible that having only 80 channels will limit a bit result quality - it might be worth experimenting with more channels.
I also noticed you've specified you are saying that you use 1 second segments - but the segment length is set to 16000 - that will be true for 16 khz sampling rate but not for 22 khz - you might want to change segment_length to 22050. I know that HiFi-Gan only uses 0.5 seconds and it gets really good results, I'm not sure what will be the quality impact of using less or more samples - for sure it will take longer to train.
Hey! I'll start by saying I really appreciate all the work you do in implementing papers that don't come with code!
I've noticed in your "config" that in general you use a 22khz sampling rate.
Your settings for mel-spectograms is limiting the mel_fmax to 8000.00 - that will be the equivalent of a 16khz sampling rate, you should consider using mel_fmax: 11025.0. It's also possible that having only 80 channels will limit a bit result quality - it might be worth experimenting with more channels.
I also noticed you've specified you are saying that you use 1 second segments - but the segment length is set to 16000 - that will be true for 16 khz sampling rate but not for 22 khz - you might want to change segment_length to 22050. I know that HiFi-Gan only uses 0.5 seconds and it gets really good results, I'm not sure what will be the quality impact of using less or more samples - for sure it will take longer to train.
Mihai Cvasnievschi