add in ph_num using textgrid files

openvpi / MakeDiffSinger

Pipelines and tools to build your own DiffSinger dataset.

BSD 3-Clause "New" or "Revised" License

87 stars 23 forks source link

add in ph_num using textgrid files #1

Closed blueyred closed 1 year ago

blueyred commented 1 year ago

This is an alternative way for generating the ph_num using textgrids for polysyllabic phoneme systems (English, Russian, etc.). If a user has used the pipelines/no_midi_preparation.ipynb and has created the textgrids already.

yqzhishen commented 1 year ago

Great work and thanks for your contribution!

By the way, have you deal with the leading phoneme of each word? As the README says,

In singing, vowels, instead of consonants, are used to align with the beginnings of notes. For this reason, each word should start with a vowel/AP/SP, and end with leading consonant(s) of the next word (if there are any).

That is to say, we may need to know about which phonemes are consonants and which are vowels to do the word division correctly. Would you please consider adding a --consonants option to take this rule into consideration?

Also, it would be better if you would write all your additional dependencies in requirements.txt and write a brief introduction to your scripts in README.md.

blueyred commented 1 year ago

Ah, I had read that vowel / consonant note but completely forgot about it while implementing this! I'll take a look and get an improved version out, with some more notes and requirements.

blueyred commented 1 year ago

I've added a simple split on phoneme option, from the phones supplied via a text file, it felt like that would offer the most flexibility should the user want to split on "y" or "AP" etc.. they could just add them into the file. Also added info into the README.md & requirements

yqzhishen commented 1 year ago

Sorry for the late respond.

What does split_on_phones mean? Can you explain it with an example in README? Also, since different people use different dictionaries, is it really proper to add a preset split phoneme list, and which dictionary is your current split_on_phones.txt suitable for?