Closed charliezjw closed 1 year ago
Hi Charlie,
You did nothing wrong, the generation script provided in the repository uses Griffin-Lim to convert the generated mel-spectrogram to an audio waveform. It is Griffin-Lim that gives this metallic effect. To aim for better audio fidelity, you could use a vocoder that would replace Griffin-Lim. There exist already many pre-trained vocoders. In our paper we use a pre-trained HiFi-GAN. As an additional step, you could also fine-tune this pre-trained vocoder on Daft-Exprt predictions. Please refer to the README of the repository for more information.
Closing due to inactivity
Hi,
It's fantastic work with excellent samples shown on the demo page.
But when I try to reproduce some results, it actually sounds metallic. I am using the checkpoint file you released "DaftExprt_LJ_ESD_22kHz", building the exact docker environment as you provided.
I did not change any code, but the TTS output sounds as follows, very metallic (I converted it to .mp4 so GitHub can support it):
https://github.com/ubisoft/ubisoft-laforge-daft-exprt/assets/3964282/026974b2-886c-43d2-80e0-7f2e52f0b7b5
https://github.com/ubisoft/ubisoft-laforge-daft-exprt/assets/3964282/80e05d23-20b1-4ba5-96b9-1b98c81fcf02
https://github.com/ubisoft/ubisoft-laforge-daft-exprt/assets/3964282/ec189b4d-7821-4162-9d57-b54cf3426799
https://github.com/ubisoft/ubisoft-laforge-daft-exprt/assets/3964282/2814cead-0c32-478b-8766-0ed96a2cd9c9
https://github.com/ubisoft/ubisoft-laforge-daft-exprt/assets/3964282/5afd9dc3-acd2-4391-b9a9-ef1c4fb95a16
I am also attaching the original .wav files here just in case: Daft_debug_samples.zip
Did I do something wrong? Or missed some steps?
Thank you very much! Charlie