rhasspy / gruut-ipa

Python library for manipulating pronunciations using the International Phonetic Alphabet (IPA)
MIT License
78 stars 12 forks source link

Question about a conversion #6

Open nitinthewiz opened 3 years ago

nitinthewiz commented 3 years ago

Hi,

I'm coming in from using coqui, but I realize this is a gruut issue more than a coqui issue, so here's my question -

For the following text - 'In the news today - Afghanistan, COVID 19, Corona Virus, Australia, India, and the USA.'

I ran the following command and got the below phonemes -

> echo 'In the news today - Afghanistan, COVID 19, Corona Virus, Australia, India, and the USA.' | python3 -m gruut en-us tokenize | python3 -m gruut en-us phonemize | jq -c ".pronunciation_text"
"ˈɪ n ð ə n j ˈu z t ə d ˈeɪ æ f ɡ ˈæ n ɪ s t ˌæ n | k ˈoʊ v ɪ d n ˈaɪ n t ˈi n | k ɚ ˈoʊ n ə v ˈaɪ ɹ ə s | ɔ s t ɹ ˈeɪ l j ə | ˈɪ n d i ə | ˈæ n d ð ə ˌʌ s ə ‖"

From what I can see, the translation for "USA" is wrong and for corona virus is iffy, at best. Coqui ends up speaking "USA" as "you-sa" (with [sa] as in [sardonic]) instead of individual letters "U", "S", "A".

How can I fix this? Is there some text cleanup I can do? I tried passing "U S A" with spaces but that doesn't fare so well either.

I face the same problem with saying "AWS", because it translates it as "ɔ z".

I can see that this is based on the phonemes text file here - https://github.com/rhasspy/gruut-ipa/blob/master/gruut_ipa/data/en-us/phonemes.txt

ɔ l[aw] ɔː ɒ

Again, splitting it up doesn't help. Could you tell me if I can do something to get gruut to acknowledge that these are individual letters that need to be spoken out exactly as they are written?