lucidrains / e2-tts-pytorch

Implementation of E2-TTS, "Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS", in Pytorch
MIT License
228 stars 21 forks source link

[WIP]Support BigVGAN vocoder #5

Closed chenht2021 closed 1 month ago

chenht2021 commented 2 months ago

Maybe someone want to use the BigVGAN.

If someone do not want mix with BigVGAN's code, can use this branch.

The notable difference between BigVGAN mel_spec and torchaudio.transforms.MelSpectrogram is here. BigVGAN make a more sqrt operator. Other small differences is just some default value difference (like norm, or mel_basis type: one is htk, one is slaney).

wetdog commented 2 months ago

That's great, I think that the default conf that manmay put is compatible with hifi-gan and others vocoders that use the same mel rep for backwards comparison.

lucidrains commented 2 months ago

@chenht2021 thanks for the pull request

i've made it so you can just pass in your custom mel spec module here