singaligner

A compact audio-to-phoneme aligner for singing voice.

The available datasets in our experiments are: Opencpop, NamineRitsu. One can experiment on your own datasets.

Once the data is prepared, you should just do:

Create a virtual environment.
Define the dataloder and the collate function in utils/data_utils.py. You can inherit the existing classes.
Import your dataloader to train.py and change trainset, valset and collate_fn in prepare_dataloaders function.
Prepare a file named phone_set.json which contains the phone set of your dataset and put it at root of data_dir.
Change the data_dir to your data path in hparams.py

Run this command to start training:

CUDA_VISIBLE_DEVICES=0 python train.py --output_directory experiments/exp_name/ --log_directory tensorboard_logs

Run this command to start inferring:

CUDA_VISIBLE_DEVICES=0 python infer_prob.py --checkpoint_path experiments/exp_name/checkpoint_name \
--output_dir experiments/exp_name/

Citation

@inproceedings{zheng2023compact,
title={A Compact Phoneme-To-Audio Aligner for Singing Voice},
author={Zheng, Meizhen and Bai, Peng and Shi, Xiaodong},
booktitle={International Conference on Advanced Data Mining and Applications},
pages={183--197},
year={2023},
organization={Springer}
}

zhengmidon / singaligner

readme

singaligner

Citation