maum-ai / phaseaug

ICASSP 2023 Accepted
https://maum-ai.github.io/phaseaug/
BSD 3-Clause "New" or "Revised" License
188 stars 14 forks source link

Will it add new inference cost to the vocoder? #1

Closed JohnHerry closed 1 year ago

JohnHerry commented 1 year ago

Thanks for the job. I am trying to find some training-auxiliary component for hifigan, which can help leaning the periority but without drawing any new parameters into the Generator. Is the phaseaug just augmenting trainging-data on phrase? Will it introduce new trainable parameters to the vocoder model? We are tring to speedup the HiFiGAN model, the decreased structures and parameters will lead to more significant High-Frequency distortion on spectrum. Will the PhaseAug increase the ability to reconstruct the signal on High frequency band? Want the speech spectrum displayed about comparation before and after PhaseAug. thanks.

junjun3518 commented 1 year ago

Hi John! Our method just add a differentiable operation, PhaseAug, during a training. During inference, it is identical to normal HiFi-GAN. If you check models.py file, you could find that Generator is identical to original HiFi-GAN code. So I think that our PhaseAug is a training-auxiliary component you want.

Sad to second question, we did not try our method to small size HiFi-GAN (such as V2, V3), but I hope that our method could reduce distortion. We will upload more gt+phaseaug samples on our demo page, so please a look after we upload it.

JohnHerry commented 1 year ago

Thanks, I will have a try.