Official homepage: http://www.vc-challenge.org/
This repository provides the UNOFFICIAL reimplementation of FastSVC for the Singing Voice Conversion Challenge 2023 (SVCC23) starter kit. SVCC23 includes two tasks: in-domain and cross-domaing singing voice conversion (SVC). In-domain SVC is trained while having access to the singing data of the target speaker, while cross-domain SVC is trained while only having access to the speech data of the target speaker.
This system uses phonetic posteriorgrams (PPGs) extracted by a pretrained ASR model, loudness, pitch, and speaker embeddings (x-vectors). The PPGs are upsampled by a scaling factor, and are fused with the downsampled loudness and pitch features after being processed by a FiLM block. The speaker embeddings are added to the fused PPG, loudness, and pitch afterwards.
Please note that there are some differences between this system and the official paper. The original FastSVC system was made for 16kHz generation.
Specific changes:
You can see some samples of the original FastSVC paper's reimplementation here. Help is needed to improve this.
To help people get started with SVC, we also developed a decomposed version of FastSVC to improve training time.
Please refer to recipe in egs/svcc23/baseline02
Please refer to the README files in egs/svcc23/fastsvc1/README.md
Please note that we will only give access to people who have signed the dataset's license agreement. To gain access to the SVCC23 dataset, please sign the license agreement in the registration form and submit it there.
Other external datasets can also be used; however, it has to be publicly available in order to encourage reproducible research. You can find more details about the specific rules of the challenge here.
$ git clone https://github.com/lesterphillip/SVCC23_FastSVC.git
$ cd SVCC23_FastSVC
$ python3 -m virtualenv venv
$ . ./venv/bin/activate
$ pip install -e .
$ ...
Please check egs/svcc23/fastsvc1/README.md
for instructions on how to run the repository.
If you find the code helpful, please cite the following.
@inproceedings{liu2021fastsvc,
title={{FastSVC: Fast Cross-Domain Singing Voice Conversion with Feature-wise Linear Modulation}},
author={Liu, Songxiang and Cao, Yuewen and Hu, Na and Su, Dan and Meng, Helen},
booktitle={IEEE International Conference on Multimedia and Expo (ICME)},
pages={1--6},
year={2021},
organization={IEEE}
}
Lester Phillip Violeta @ Nagoya University (@lesterphillip)
Songxiang Liu @ Tencent AI Lab (@liusongxiang)
Wen-Chin Huang @ Nagoya University (@unilight)
Lester Phillip Violeta @ Nagoya University (@lesterphillip)
Jiatong Shi @ Carnegie Mellon University (@ftshijt)
Songxiang Liu @ Tencent AI Lab (@liusongxiang)
Tomoki Toda @ Nagoya University
We would like to thank Ryuichi Yamamoto (@r9y9) for his valuable insights in developing this repository.
The skeleton of this repository was also greatly based on @kan-bayashi's awesome ParallelWaveGAN repository. Please check it out if you need to train any sort of vocoder.
We also used @chomeyama's HN-uSFGAN repository as a reference.
Please submit an issue if you encounter any bugs or have any questions about the repository. You may also contact the organizing team through this e-mail if you have any questions about SVCC itself.
svcc2023__at__vc-challenge.org
(replace the __at__
with @
)