Open kenxxxxx opened 3 months ago
@RMSnow Thank you, Xueyao, for your detailed comments! @kenxxxxx Yuchen, please familiarize yourself with Git-based development and directly update your code on your fork so we can track your revision progress.
BTW, use black to format the code to pass the format check
✨ Description
In this PR, we release an unofficial PyTorch implementation of Noro, a Noise-Robust One-shot Voice Conversion (VC) system. This model is designed to convert the timbre of speech from a source speaker to a target speaker using only a single reference speech sample while preserving the semantic content of the original speech. Noro introduces innovative components tailored for VC using noisy reference speeches, including a dual-branch reference encoding module and a noise-agnostic contrastive speaker loss.
The main purpose of this PR is to provide a noise-robust VC solution that performs effectively even with noisy reference speeches, making it suitable for real-world applications. Additionally, we explore the hidden speaker representation capabilities of the VC system by repurposing its reference encoder as a speaker encoder, demonstrating competitive performance with advanced self-supervised learning models.
To test this PR, follow the instructions in the updated README.md to set up the environment, train the model, and evaluate its performance under different acoustic environments.
🚧 Related Issues
None
👨💻 Changes Proposed
🧑🤝🧑 Who Can Review?
@RMSnow @HarryHe11 @Adorable-Qin
✅ Checklist