open-mmlab / Amphion

Amphion (/Γ¦mˈfaΙͺΙ™n/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
https://openhlt.github.io/amphion/
MIT License
4.28k stars 363 forks source link

Add FreeVC implementation #201

Open Nugine opened 2 months ago

Nugine commented 2 months ago

✨ Description

FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion

This PR is a part of AIR6063 final project.

FYI, we also have another repo which refactors the training pipeline. Both the PR code and the custom code can produce good checkpoints.

Here are our checkpoints trained with PR code on single NVIDIA RTX4090

🚧 Related Issues

During the project, we have opened some issues and another PR to help improve Amphion.

πŸ‘¨β€πŸ’» Changes Proposed

πŸ§‘β€πŸ€β€πŸ§‘ Who Can Review?

[Please use the '@' symbol to mention any community member who is free to review the PR once the tests have passed. Feel free to tag members or contributors who might be interested in your PR.] @zhizhengwu @RMSnow @Adorable-Qin

βœ… Checklist

SeanYouLaw commented 2 months ago

Here are some examples of our results:

https://github.com/open-mmlab/Amphion/assets/58773169/3256156a-d77a-4079-a437-6b163a8d0c12

https://github.com/open-mmlab/Amphion/assets/58773169/02a1e70e-ec1b-4755-b4e5-9e50ebdb4620

https://github.com/open-mmlab/Amphion/assets/58773169/cac3e57b-782e-43f7-96d5-48420c113f23


https://github.com/open-mmlab/Amphion/assets/58773169/0bec78cd-1543-44dd-9673-9842dd96f7fa

https://github.com/open-mmlab/Amphion/assets/58773169/a7faf334-d863-4d11-beb5-4bd220055a7b

https://github.com/open-mmlab/Amphion/assets/58773169/055a3d9b-572f-4079-811a-f15becbc2c83


https://github.com/open-mmlab/Amphion/assets/58773169/2fc73678-0e60-4665-89ce-52c96aeeaded

https://github.com/open-mmlab/Amphion/assets/58773169/56dec48f-cc1d-403c-af30-ef80b29343a7

https://github.com/open-mmlab/Amphion/assets/58773169/fa8e6091-70f0-431f-b809-82c4c9bee39d

RMSnow commented 2 months ago

The quality of the samples sounds good. @Adorable-Qin Please check the code and document carefully.

SeanYouLaw commented 2 months ago

Here are some examples of our results, using the checkpoint of 183 epoch(120k steps) training(while above examples are from the pretrained checkpoint):

https://github.com/open-mmlab/Amphion/assets/58773169/3b85076d-187c-4329-9405-f71eea734b89

https://github.com/open-mmlab/Amphion/assets/58773169/b3a1f57a-7e56-4ccc-a319-f7f22bdf11f5

https://github.com/open-mmlab/Amphion/assets/58773169/bbde9081-e414-41e8-848b-304176d871cd

https://github.com/open-mmlab/Amphion/assets/58773169/6754c030-5703-4b5f-afce-85e49282eb55

https://github.com/open-mmlab/Amphion/assets/58773169/0f7b0bc7-8b03-456e-9f59-4ba1b13be955

https://github.com/open-mmlab/Amphion/assets/58773169/de76a1da-95df-4aab-a727-1ccd24bcaaf0

https://github.com/open-mmlab/Amphion/assets/58773169/d849b433-5478-46ca-ae85-f9737bf6de57

https://github.com/open-mmlab/Amphion/assets/58773169/0e1dfd5b-c09f-4da9-842b-94acb49c2bdc

https://github.com/open-mmlab/Amphion/assets/58773169/6ae3b513-035c-42fb-8f0f-cf5041098850

Nugine commented 2 months ago

Our AutoDL server will expire tomorrow. Here is a demo video recording the training status.

https://github.com/open-mmlab/Amphion/assets/30099658/ca69347c-cd65-4052-8666-749900cb12ab