open-mmlab / Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
https://openhlt.github.io/amphion/
MIT License
5.91k stars 452 forks source link

Add VC Noro model #247

Open kenxxxxx opened 3 months ago

kenxxxxx commented 3 months ago

✨ Description

In this PR, we release an unofficial PyTorch implementation of Noro, a Noise-Robust One-shot Voice Conversion (VC) system. This model is designed to convert the timbre of speech from a source speaker to a target speaker using only a single reference speech sample while preserving the semantic content of the original speech. Noro introduces innovative components tailored for VC using noisy reference speeches, including a dual-branch reference encoding module and a noise-agnostic contrastive speaker loss.

The main purpose of this PR is to provide a noise-robust VC solution that performs effectively even with noisy reference speeches, making it suitable for real-world applications. Additionally, we explore the hidden speaker representation capabilities of the VC system by repurposing its reference encoder as a speaker encoder, demonstrating competitive performance with advanced self-supervised learning models.

To test this PR, follow the instructions in the updated README.md to set up the environment, train the model, and evaluate its performance under different acoustic environments.

🚧 Related Issues

None

👨‍💻 Changes Proposed

🧑‍🤝‍🧑 Who Can Review?

@RMSnow @HarryHe11 @Adorable-Qin

✅ Checklist

HarryHe11 commented 3 months ago

@RMSnow Thank you, Xueyao, for your detailed comments! @kenxxxxx Yuchen, please familiarize yourself with Git-based development and directly update your code on your fork so we can track your revision progress.

RMSnow commented 3 weeks ago

BTW, use black to format the code to pass the format check