Closed nestyme closed 1 year ago
You will need a G2P module/dictionary to convert text to phonemes and a duration predictor to get their durations, no matter which version of DiffSinger you are using. The original DiffSinger has a integrated Chinese G2P module and a duration predictor bounded to the acoustic model. In this repository you need to train a duration predictor in the variance model.
@yqzhishen thank you so much for a quick and informative responses! I am sorry but do you have any documentation how to prepare/train variance model from scratch? I found migration documentation only in variance folder
Prepare an acoustic dataset and extend it to a variance dataset - this is the standard workflow.
See the MakeDiffSinger repository for useful scripts and introductions.
thank you!
Hello! Thank you for this repository. I have a question, regarding inference input parameters. I want to run text+f0 to song engine. I trained the acoustic model, but looks like it still wants phoneme durations. So, is it correct that I cannot run inference for this version of diffsinger without gt phoneme durations for any input as in original repo (and this way diffsinger is more like voice conversion?) thank you