rishikksh20 / HiFi-GAN

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
MIT License
81 stars 20 forks source link

multiband-hifigan #2

Open nukes opened 3 years ago

nukes commented 3 years ago

Hi, Did you try the idea multiband hifigan?

rishikksh20 commented 3 years ago

Nope I haven't tried. But I am planning to do.

nukes commented 3 years ago

Hi any update on this ? I try this idea. but the result is not good.I use fullband stft + subband stft + mel + adv loss combination and the predicted wave has artifact in a specific frequency bin. After 400K step, this artifact still does not disappear. I want to know if you still meet the same issue and whether you still use the mel loss as part of the generator loss ?
image

rishikksh20 commented 3 years ago

@nukes I trained it around 1 M and these artefacts band disappeared around 800k and quality is also good.

nukes commented 3 years ago

Good news! what i obeserve is that this artifacts appears periodically . Something like disappears in 300k, then appeears in 310k. Did you observe the same pattern ? And, do you use mel loss ?

rishikksh20 commented 3 years ago

@nukes Yes, after 800k that periodicity decreased and most of the time artifacts are less or none. Mel Loss throws an error because the generated audio exceeds the value of 1 which creates problem when we convert wav to mels for error calculation, its not often but sometimes it's throw an error mostly around 20k to 40k steps so I start training with mel loss, adv loss, STFT and sub STFT losses but around 20k when mel loss errors pops up I just comment mel loss and for remaining training I only used STFT, sub STFT losses with Adv loss.

nukes commented 3 years ago

Got it! i am still training my model and i will let you know the result once to 800k.

nukes commented 3 years ago

Also do you think it is worthy to try MultiStepLR learning rate scheduler just like mb-melgan? I saw the subband loss fluctuates dramatically while the mb-melgan learning curve is much more smooth and the periodical artifact disappears around 300k-400k.

rishikksh20 commented 3 years ago

@nukes Yeah I have same thought on that.

nukes commented 3 years ago

Hi i try the idea "mb-hifigan", but the result is not good. At the high-frequncy bins, the structure is quite blurry, while the org-hifigan has a much better performance at high-freq bins. did you see the see result? org-hifigan: image mb-hifigan: image

ysujiang commented 2 years ago

hi,What's the result of mb-hifigan now?Is it better now?