lucidrains / e2-tts-pytorch

Implementation of E2-TTS, "Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS", in Pytorch
MIT License
228 stars 21 forks source link

Make the default mel spec compatible with vocos #13

Closed lucasnewman closed 1 month ago

lucasnewman commented 1 month ago

Vocos is a really small and fast vocoder — we previously used it for Voicebox as it handles both mel spec and Encodec-based vocoding.

This change uses their particular mel spec recipe as the default (a seemingly common spec for TTS systems), so you can pip install vocos and then pass vocos.decode to the sampling function and get audio output.

This isn't strictly necessary to take, but I thought it might be useful for folks who are hoping to train a working network 'out of the box' without needing to think about the transform!

lucidrains commented 1 month ago

@lucasnewman let's do it! thank you Lucas!