rishikksh20 / Fre-GAN-pytorch

Fre-GAN: Adversarial Frequency-consistent Audio Synthesis
MIT License
101 stars 33 forks source link

comparison with univnet #1

Open thepowerfuldeez opened 3 years ago

thepowerfuldeez commented 3 years ago

Hi! How this work compares with UnivNet for which one you already implemented code: https://github.com/rishikksh20/UnivNet-pytorch This paper is a little bit newer but afaik they're more concerned about generalizability of model for unseen speakers whlie this work focuses on overall quality (especially in high frequences) can you maybe elaborate?

rishikksh20 commented 3 years ago

@thepowerfuldeez Fre-GAN is better than UnivNet

thepowerfuldeez commented 3 years ago

have you tried to train on LJSpeech or your dataset? How much iterations needed comparing with HiFiGAN? Do you have checkpoints somewhere?

rishikksh20 commented 3 years ago

I tried on my own dataset it takes 150k itr to generate excellent voice whereas HiFi-GAN usually takes 1 M steps for same quality.

rishikksh20 commented 3 years ago

It only takes 2 days to reach 150k itr

thepowerfuldeez commented 3 years ago

got it, thanks

thepowerfuldeez commented 3 years ago

tried it out. i compare publicly available universal v1 hifigan (trained on 2.5M iterations on vctk) with this one trained on 150k at new HIFI-TTS dataset (5 times more data). It sounds great but I think it should be trained a bit more. Maybe 250k will be enough.

Iamgoofball commented 2 years ago

Out of curiosity how many GPUs did you train with, and which ones?

thepowerfuldeez commented 2 years ago

3x3090 with batch 16 but I can confirm that fre-gan is training much faster than hifi-gan