open-mmlab / Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
https://openhlt.github.io/amphion/
MIT License
4.45k stars 381 forks source link

Add TransformerVC implementation #90

Closed ArkhamImp closed 5 months ago

ArkhamImp commented 8 months ago

The code has been tested, and the model is under training.

lmxue commented 8 months ago

Thanks for your efforts. There are some suggestions for a standard PR, especially for the PR that integrates a new model into Amphion:

RMSnow commented 8 months ago

Thanks @ArkhamImp a lot to integrate our first VC model. For the "New Feature" PR, the review stage is more strict and usually takes several request-change loops.

You can follow these two PRs to provide the inference samples of TransformerVC, so that we can ensure the code is bug-free:

  1. https://github.com/open-mmlab/Amphion/pull/56
  2. https://github.com/open-mmlab/Amphion/pull/14

A perfect "New Feature" PR will contain:

I think for this PR, you need to provide (1)-(2) at least. Besides, you can schedule an expected date for (3) to inform others. After (1)-(3), we can assign other developers to cooperate with you to accomplish (4) and (5).

ArkhamImp commented 8 months ago

Voice Conversion Samples

Sample 1

Source

https://github.com/open-mmlab/Amphion/assets/16316402/f2f771c9-df52-481b-9750-f61b0c9d7ac7

Target

https://github.com/open-mmlab/Amphion/assets/16316402/78776ff3-d469-4813-bab2-a3dc3b48e1d9

Converted

https://github.com/open-mmlab/Amphion/assets/16316402/2fc2d40c-a1e3-43b3-a302-7acbe8993732

Sample 2

Source

https://github.com/open-mmlab/Amphion/assets/16316402/ff82282b-5f90-4f37-9050-994c1bc411c9

Target

https://github.com/open-mmlab/Amphion/assets/16316402/6a3b45a7-4475-4d78-a4e7-00d9757124db

Converted

https://github.com/open-mmlab/Amphion/assets/16316402/76130374-0d56-4682-b6f3-7fefc98ff512

ArkhamImp commented 7 months ago

Samples of VitsVC: https://x8gvg3n7v3.feishu.cn/docx/SPbnd0gHcowovGxCyOLcndfanAb?from=from_copylink

ArkhamImp commented 7 months ago

Ready to integrate.