rhasspy / tts-prompts

Phonetically balanced text to speech sentences
MIT License
9 stars 2 forks source link

nl-nl, nl-be or just nl? #3

Open koenvervloesem opened 4 years ago

koenvervloesem commented 4 years ago

I made sure to weed out most sentences that were too northern Dutch (nl-nl) or too Flemish (nl-be).

However, the name nl-nl seems to be too specific. Shouldn't it be just nl if we want to use this for both language variants?

synesthesiam commented 4 years ago

This is a good point. I need to be careful about naming the voice, etc. derived from these prompts too. But for locales, I usually only see nl_NL or nl_BE. Is there a third more generic one?

koenvervloesem commented 4 years ago

Well the voices should definitely have nl_BE in their name if the speakers are Flemish, but I think the prompts should just use the generic locale nl, as they can be spoken both by Dutch and Flemish speakers.

However, thinking about these differences I just found some inconsistencies in the dictionary. What's the source of the nl.dict.gz file? Because this is Northern Dutch pronunciation:

politie p o ˈl i t s i

But this is Flemish pronunciation:

politiebediende p o ˈl i s i b ə ˌd i n d ə

Notice the difference: politie is pronounced p o ˈl i t s i in Northern Dutch and p o ˈl i s i in Flemish.

synesthesiam commented 4 years ago

The pronunciations are coming from the Dutch wiktionary. For "politie", the IPA is / poˈli(t)si / which contains the optional (t). It looks like I need to make the parser generate both forms (with and with the t) when it encounters optional pieces.