open-mmlab / Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
https://openhlt.github.io/amphion/
MIT License
4.5k stars 386 forks source link

Amphion VALL-E new version release #220

Closed jiaqili3 closed 3 months ago

jiaqili3 commented 3 months ago

✨ Description

In this PR, we release an unofficial PyTorch implementation of VALL-E, a zero-shot voice cloning model via neural codec language modeling. If trained properly, this model could match the performance specified in the original paper. This is a refined version compared to the first version of VALLE in Amphion, we have changed the underlying implementation to Llama to provide better model performance, faster training speed, and more readable codes. This can be a great tool for users who want to learn speech language models and its implementation.

🚧 Related Issues

None

👨‍💻 Changes Proposed

🧑‍🤝‍🧑 Who Can Review?

@HeCheng0625 @RMSnow @HarryHe11 @zhizhengwu

✅ Checklist

RMSnow commented 3 months ago

Do we have any pretrained models or demo for this new valle?

jiaqili3 commented 3 months ago

Do we have any pretrained models or demo for this new valle?

It has been detailed in the readme file in egs/tts/valle_v2, and the demo.ipynb has also been uploaded to run inference with pretrained weights

jiaqili3 commented 3 months ago

Hi @RMSnow , thanks for your review! I've updated the code and your previous review questions have been resolved.

RMSnow commented 3 months ago

Hi @jiaqili3, please update the demo.ipynb. Others look good to me.

jiaqili3 commented 3 months ago

Updated. Thanks @RMSnow