Closed mbarnig closed 2 years ago
Thanks! I had forgotten to include "fa" and "sw" in the setup.py language list. Should be fixed in 1.2.2.
I cloned the latest gruut version with the modified setup-py file from Github, installed it with pip install .
and downloaded the persian and swahili languages with
wget https://github.com/rhasspy/gruut/releases/download/v0.9.0/fa.tar.gz
wget https://github.com/rhasspy/gruut/releases/download/v1.2.0/sw.tar.gz
inside the folders $HOME/.config/gruut/fa
and $HOME/.config/gruut/sw
.
The persian language files are extracted as expected:
mbarnig@mbarnig-MS-7B22:~/.config/gruut/fa$ tar -xvf fa.tar.gz
g2p.fst
language.yml
lexicon.db
phonemes.txt
postagger.model
The swahili language files are archived inside the sw
folder:
mbarnig@mbarnig-MS-7B22:~/.config/gruut/sw$ tar -xvf sw.tar.gz
sw/
sw/__init__.py
sw/lexicon.db
sw/espeak/
sw/espeak/lexicon.db
sw/espeak/g2p/
sw/espeak/g2p/model.crf
sw/VERSION
sw/g2p/
sw/g2p/model.crf
I moved back one step to extract the content to the correct folder-level. Now my test with the Swahili language is working:
(rhasspy-gruut) mbarnig@mbarnig-MS-7B22:~/rhasspy-gruut/gruut$ echo 'Kaskazini Upepo na jua wali kuwa wana shindana gani iko na nguvuu kushinda mwingine, msafiri aka kuja na alikuwa anavaa koti mzito. Wali kubaliana mtu ya kwanza kutoa koti ya msafiri ndio akona nguvu kushinda ingine. Upepo ya kaskazini ika jaribu kupiga upepo yake yote, lakini akaona vigumu yake inapiga, zaidi msafiri anafunga koti yake karibu naye, mpaka upepo ya kaskazini ikajishinda. Jua ikaanza ku ngua, mpaka msafiri akatoa koti yake mara moja. Sasa Upepo ya Kaskazini ika kubali jua ikona nguvu kuishinda.' \
> | python3 -m gruut sw tokenize \
> | python3 -m gruut sw phonemize \
> | jq -c .pronunciation_text
"k ɑ s k ɑ z i n i u p ɛ p ɔ n ɑ ʄ u ɑ w ɑ l i k u w ɑ w ɑ n ɑ ʃ i ⁿɗ ɑ n ɑ ɠ ɑ n i i k ɔ n ɑ ᵑg u v u u k u ʃ i ⁿɗ ɑ m w i ᵑg i n ɛ | m s ɑ f i ɾ i ɑ k ɑ k u ʄ ɑ n ɑ ɑ l i k u w ɑ ɑ n ɑ v ɑ ɑ k ɔ t i m z i t ɔ ‖ w ɑ l i k u ɓ ɑ l i ɑ n ɑ m t u j ɑ k w ɑ ⁿz ɑ k u t ɔ ɑ k ɔ t i j ɑ m s ɑ f i ɾ i ⁿɗ i ɔ ɑ k ɔ n ɑ ᵑg u v u k u ʃ i ⁿɗ ɑ i ᵑg i n ɛ ‖ u p ɛ p ɔ j ɑ k ɑ s k ɑ z i n i i k ɑ ʄ ɑ ɾ i ɓ u k u p i ɠ ɑ u p ɛ p ɔ j ɑ k ɛ j ɔ t ɛ | l ɑ k i n i ɑ k ɑ ɔ n ɑ v i ɠ u m u j ɑ k ɛ i n ɑ p i ɠ ɑ | z ɑ i ɗ i m s ɑ f i ɾ i ɑ n ɑ f u ᵑg ɑ k ɔ t i j ɑ k ɛ k ɑ ɾ i ɓ u n ɑ j ɛ | m p ɑ k ɑ u p ɛ p ɔ j ɑ k ɑ s k ɑ z i n i i k ɑ ʄ i ʃ i ⁿɗ ɑ ‖ ʄ u ɑ i k ɑ ɑ ⁿz ɑ k u ᵑg u ɑ | m p ɑ k ɑ m s ɑ f i ɾ i ɑ k ɑ t ɔ ɑ k ɔ t i j ɑ k ɛ m ɑ ɾ ɑ m ɔ ʄ ɑ ‖ s ɑ s ɑ u p ɛ p ɔ j ɑ k ɑ s k ɑ z i n i i k ɑ k u ɓ ɑ l i ʄ u ɑ i k ɔ n ɑ ᵑg u v u k u i ʃ i ⁿɗ ɑ ‖"
There is however still a problem with the persian language. At the first run I received a warning about the installation of hazm>=0.7.0
and the following error :
UnboundLocalError: local variable 'hazm' referenced before assignment
. I installed hazm
and now the test with the persian language is working:
(rhasspy-gruut) mbarnig@mbarnig-MS-7B22:~/rhasspy-gruut/gruut$ echo 'باد شمال و خورشید داشتن سر اینکه کدوم قویتر هستند بحث میکردن که یکدفعه یه مسافر که خودش رو در بالاپوش گرمی پوشونده بود پیداش شد. قرار گذاشتن که هر کدوم که بتونه اوّل مسافر رو مجبور به در آوردن بالاپوشش بکنه قویتر از اونیکیه. بعد باد شمال به شدیدترین صورتی که میتونست شروع به وزیدن کرد، ولی هرچقدر سختتر میوزید، مسافر بالاپوش رو محکمتر به دور خودش میپیچید. در آخر، باد شمال پشیمون شد و دست برداشت. بعد، خورشید شروع کرد به گرمی تابیدن، و مسافر بلافاصله بالاپوشش رو در آورد. به همین خاطر، باد شمال مجبور شد اعتراف کنه که بین اونها، خورشید قویتره.' \
> | python3 -m gruut fa tokenize \
> | python3 -m gruut fa phonemize \
> | jq -c .pronunciation_text
" ʃ o m ɒː l v æ d ɒː ʃ t æ n e̞ s æ ɾ e̞ iː n k e̞ h æ s t æ n d b æ h s k e̞ k e̞ x o d æ ʃ ɾ uː d æ ɾ b uː d ʃ o d ‖ ɢ æ ɾ ɒː ɾ k e̞ h æ ɾ k e̞ ɾ uː m æ d͡ʒ b uː ɾ b e̞ d æ ɾ ɒː v æ ɾ d æ n e̞ æ z ‖ b æ ʔ d ʃ o m ɒː l b e̞ s uː ɾ æ t iː k e̞ ʃ o ɾ uː ʔ b e̞ k o ɾ d | v æ l iː | ɾ uː b e̞ d uː ɾ e̞ x o d æ ʃ ‖ d æ ɾ ɒː x æ ɾ | ʃ o m ɒː l ʃ o d v æ d æ s t b æ ɾ d ɒː ʃ t ‖ b æ ʔ d | ʃ o ɾ uː ʔ k o ɾ d b e̞ | v æ ɾ uː d æ ɾ ɒː v æ ɾ æ d ‖ b e̞ h æ m iː n x ɒː t e̞ ɾ | ʃ o m ɒː l m æ d͡ʒ b uː ɾ ʃ o d k e̞ b e̞ j n | ‖"
As I don't understand both languages I can't check if the phonemization is correct. :smile: :laughing: :smiley:
I installed gruut version 1.2.1 on my Desktop PC with
pip install gruut[fr,it,de,pt,de,sv,cs,es,nl,ru,fa,sw]
to test all supported languages, by running the following script :It works as expected for == en, en-us, de, de-de, sv, sv-se, pt, pt-br, it, it-it, nl, es, es-es, cs, cs-cz, ru, ru-ru.
For == fa (persian) an assertion error in line 118 of lang.py is issued. Here is the related log :
For == sw (swahili) an assertion error in line 184 of lang.py is issued. Here is the related log :