rhasspy / gruut-ipa

Python library for manipulating pronunciations using the International Phonetic Alphabet (IPA)
MIT License
80 stars 12 forks source link

cannot reproduce Zamia lexicon.txt entries #2

Open fquirin opened 3 years ago

fquirin commented 3 years ago

Hi Michael,

I've been experimenting with gruut-ipa and Zamia lexicon.txt from the popular models 'kaldi-generic-de-tdnn_250-r20190328' and ''kaldi-generic-en-tdnn_250-r20190609" and I'm having trouble getting the expected results. As far as I understand Zamias lexicon.txt: is in sampa format, so I selected two words from the German file:

hallo --> h '{ l @ U
welt --> v 'E l t

and then used espeak-ng to generate phonems:

# espeak IPA phonems:
espeak-ng -v de -x -q --sep=" " --ipa "hallo"
h ˈa l oː
espeak-ng -v de -x -q --sep=" " --ipa "welt"
v ˈɛ l t

# espeak default phonems:
h 'a l o:
v 'E l t

Finally I've tried to convert the espeak results to sampa with gruut-ipa:

python3 -m gruut_ipa convert ipa sampa "h ˈa l oː"
h "a l o:

python3 -m gruut_ipa convert espeak sampa "h 'a l o:"
h "a 5 o:

python3 -m gruut_ipa convert ipa sampa "v ˈɛ l t"
v "E l t

python3 -m gruut_ipa convert espeak sampa "v 'E l t"
v "E 5 t

But none of the results matches the lexicon.txt entries. Any help or hints would be appreciated! :-)

fquirin commented 3 years ago

Actually I'm starting to think that hallo --> h '{ l @ U in the German Zamia lexicon is just wrong and referring to the English pronunciation :sweat_smile: since related words are for example hallodri --> h a l 'o: d R i: (almost identical to German hallo) and halloween --> h '{ l @ w i n (the English hallo). This would make the espeak IPA to SAMPA pipeline with gruut almost correct except for the apostrophe.

[EDIT] To be honest I'm confused about what's the correct symbol here. According to Wikipedia and this converter ˈ (unicode U+02C8) is " (unicode U+0022) in X-SAMPA, but I don't see it anywhere in the Zamia Kaldi lexicon. X-SAMPA seems to have different flavors :see_no_evil: (Conlang X-SAMPA (CXS))

[EDIT2] Guenter himself seems to use this mapping

synesthesiam commented 3 years ago

Guenter's phonemes seem to be like X-SAMPA, but not quite it. I have a English map for Zamia, but I will need to add a German map too :+1:

fquirin commented 3 years ago

We had some discussions about it, maybe it helps ^^: https://github.com/gooofy/zamia-speech/issues/79

My conclusion was that I kind of need a manual check: