gpt-omni / mini-omni2

Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。
https://arxiv.org/abs/2410.11190
MIT License
1.28k stars 145 forks source link

What role does SNAC play in this framework? #13

Open AjianIronSide opened 1 week ago

AjianIronSide commented 1 week ago

Hi,

I wonder what function does SNAC module actually have. Can we think it a TTS module or not ? Or what happen if just not use SNAC or other codec module in the framework?

mini-omni commented 6 days ago

hi, we use SNAC to encode audio, and predict the snac tokens as the audio output. SNAC is an audio encodec method. If you use other codec methods, like Encodec, you need to retrain the model to for adaptation.