Vocos is a really small and fast vocoder — we previously used it for Voicebox as it handles both mel spec and Encodec-based vocoding.
This change uses their particular mel spec recipe as the default (a seemingly common spec for TTS systems), so you can pip install vocos and then pass vocos.decode to the sampling function and get audio output.
This isn't strictly necessary to take, but I thought it might be useful for folks who are hoping to train a working network 'out of the box' without needing to think about the transform!
Vocos is a really small and fast vocoder — we previously used it for Voicebox as it handles both mel spec and Encodec-based vocoding.
This change uses their particular mel spec recipe as the default (a seemingly common spec for TTS systems), so you can
pip install vocos
and then passvocos.decode
to the sampling function and get audio output.This isn't strictly necessary to take, but I thought it might be useful for folks who are hoping to train a working network 'out of the box' without needing to think about the transform!