A compact audio-to-phoneme aligner for singing voice.
The available datasets in our experiments are: Opencpop, NamineRitsu. One can experiment on your own datasets.
Once the data is prepared, you should just do:
CUDA_VISIBLE_DEVICES=0 python train.py --output_directory experiments/exp_name/ --log_directory tensorboard_logs
CUDA_VISIBLE_DEVICES=0 python infer_prob.py --checkpoint_path experiments/exp_name/checkpoint_name \
--output_dir experiments/exp_name/
@inproceedings{zheng2023compact,
title={A Compact Phoneme-To-Audio Aligner for Singing Voice},
author={Zheng, Meizhen and Bai, Peng and Shi, Xiaodong},
booktitle={International Conference on Advanced Data Mining and Applications},
pages={183--197},
year={2023},
organization={Springer}
}