open-mmlab / Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
https://openhlt.github.io/amphion/
MIT License
4.45k stars 379 forks source link

[Feature]: Flow-based Model #250

Closed Ming-er closed 1 month ago

Ming-er commented 1 month ago

Hi, recent days have witnessed a lot of flow-based models achieving sota tts and tta performance. I wonder if there are any plans to release some codes for flow-based models, such as voicebox. Thanks.

jiaqili3 commented 1 month ago

Hi @Ming-er , yes we do have plans to train flow matching models for TTS, and we'll definitely release them if they perform good. We welcome any advice on what models to reproduce. Thanks!