open-mmlab / Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
https://openhlt.github.io/amphion/
MIT License
4.45k stars 379 forks source link

Multi-speaker VITS & Hi-Fi TTS dataset structure #131

Closed zyingt closed 6 months ago

zyingt commented 7 months ago

✨ Description

This PR introduces multi-speaker support for the current VITS model. It allows for the synthesis of speech in multiple voices and enables users to choose the specific speaker's voice that suits their preferences. To test this PR, you may follow the guidelines specified in the latest egs/tts/VITS/README.md.

🚧 Related Issues

None

👨‍💻 Changes Proposed

[1] Enabling multi-speaker VITS support:

[2] Streamlined Hi-Fi TTS dataset preprocessing:

[3] Changes on VITS dataset loader:

[4] Enhance model compatibility for different accelerate versions

[5] Black formatting

🧑‍🤝‍🧑 Who Can Review?

@lmxue @RMSnow

🛠 TODO

✅ Checklist