openvpi / DiffSinger

An advanced singing voice synthesis system with high fidelity, expressiveness, controllability and flexibility based on DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism
Apache License 2.0
2.62k stars 275 forks source link

On the problem of phoneme loss when using Opencpop data set to train automatic pitch model #134

Closed Wangs-official closed 11 months ago

Wangs-official commented 11 months ago

Hello, developers, I am practicing a diffsinger sound library with Opencpop data set. As shown in the figure, the phonemes of "en", "e" and "ir" are missing during preprocessing. imageimageimage

After being instructed by others, I found that the phoneme "En" is not the same as the phoneme "en" and the phoneme "E" is not the same as the phoneme "e".Moreover, the factor "ir" is not a correct pronunciation of Chinese Pinyin, which is more like an English pronunciation.I hope developers can follow up and fix this bug in time. The current situation is that four phonemes are missing, and the preprocessing step is reported as an error, and the train.lengths file is not generated, which leads to the failure to start the later training.We are using variance automatic pitch prediction model.

I think it may have something to do with dictionaries.

最初由 @Wangs-official 在 https://github.com/openvpi/DiffSinger/issues/133 发布

yqzhishen commented 11 months ago

For the 4 new phoneme tags added, see https://github.com/openvpi/DiffSinger/blob/main/docs/BestPractices.md#preset-dictionaries.

This is the opencpop transcriptions adapted to the current dictionary: transcriptions-correction.csv

By the way, I have to warn you that opencpop does not have a very accurate MIDI labeling. You may not get satisfactory results if you train pitch predictor on it.

Wangs-official commented 11 months ago

thank you