The current MakeDiffSinger repository only supports automatic ph_num inference for monosyllabic phoneme systems. The reason is that we cannot judge the onset phones from a sequence of phones of a polysyllabic system, where one word can have multiple vowels, and not all vowels are onset phones.
However, the onset phones can still be inferred provided with proper extra information. Some useful information may be:
A list of onset vowels.
A list of non-onset phones that can be shifted/skipped in the front of each word.
Combination patterns that can determine onset phones in them.
Also, with a graceful implementation of this idea, we may unify the algorithm to all universal dictionaries, despite their languages and phoneme system categories.
Motivation
The current MakeDiffSinger repository only supports automatic
ph_num
inference for monosyllabic phoneme systems. The reason is that we cannot judge the onset phones from a sequence of phones of a polysyllabic system, where one word can have multiple vowels, and not all vowels are onset phones.However, the onset phones can still be inferred provided with proper extra information. Some useful information may be:
Also, with a graceful implementation of this idea, we may unify the algorithm to all universal dictionaries, despite their languages and phoneme system categories.