openvpi / DiffSinger

An advanced singing voice synthesis system with high fidelity, expressiveness, controllability and flexibility based on DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism
Apache License 2.0
2.62k stars 275 forks source link

Implement melody encoder and support glide input #143

Closed yqzhishen closed 9 months ago

yqzhishen commented 9 months ago

Implementation of #142.

Experiment results

We trained pitch predictors on three datasets, each containing one singer to test the effects of melody encoder:

The comparisons on maximum RPAs (raw pitch accuracy with tolerance of 50 cents) achieved after convergence (>150k steps) are shown below.

w/ base pitch | w/ melody encoder | w/ glide embedding | Female#1 | Female#2 | Male#1 -- | -- | -- | -- | -- | -- ✓ | × | N/A | 0.8613 | 0.6128 | 0.6073 ✓ | ✓| × | **0.8744** | 0.6575 | 0.6276 × | ✓ | × | 0.8629 | **0.6879** | **0.6461** × | ✓ | ✓ | - | **0.6961** | -

The results showed that melody encoder is more suitable than base pitch to carry music score information, especially on expressive datasets. On TensorBoard, significant improvements on short slurs and long vibratos were also observed. In our internal tests, pitch predictors with melody encoder also outperformed the old method on out-of-range notes, and can still show its sensitiveness even if the music scores are far higher than normal range (e.g. over C7 for a male singer). [Demo]

Additional experiments on ornaments: the glides

With the modeling of melody encoder on note sequence, we successfully introduced ornament flags to the architecture of the variance model. For this time we tested glides, where the pitch smoothly rises at the beginning of the note, or drops at the end of the note. We labeled 31 notes that glide up and 75 notes that glide down out of 71 minutes of data from Female#1, and left everything else unchanged. The experiment results showed a slightly higher RPA with glide type embedding than the baseline. In further tests, melody encoder with glide type embedding can produce accurate and natural glides with simple glide flags, without having to draw manual pitch curves like before. [Demo]