An advanced singing voice synthesis system with high fidelity, expressiveness, controllability and flexibility based on DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism
Absolute values of some variance parameters are two large. Normalize them to the similar scale may improve the performance and benifits fine-tuning.
Sometimes linear normalization has trade-offs between precision and range. For example, delta_pitch needs more precision around 0, but also needs a wider range than the current default ±8 keys in some situations.
TODO
[ ] New option to normalize variance parameters to (-1, 1) before they are embedded into the model
[ ] Multiple types of normalization: linear, tanh, etc.
[ ] Generalize configuration schemas of all parameters
Motivation
delta_pitch
needs more precision around 0, but also needs a wider range than the current default ±8 keys in some situations.TODO