rhasspy / gruut

A tokenizer, text cleaner, and phonemizer for many human languages.
MIT License
279 stars 36 forks source link

"TypeError: can only join an iterable" in phonemize-lexicon #3

Closed mbarnig closed 3 years ago

mbarnig commented 3 years ago

The script

$ zcat <language>_lexicon.txt.gz | gruut <language> phonemize-lexicon

generates the following error, for all supported languages:

$zcat data/fr-fr_lexicon.txt.gz | gruut fr-fr phonemize-lexicon
Traceback (most recent call last):
File "/home/mbarnig/.local/bin/gruut", line 8, in <module>
sys.exit(main())
File "/home/mbarnig/.local/lib/python3.8/site-packages/gruut/__main__.py", line 90, in main
args.func(config, args)
File "/home/mbarnig/.local/lib/python3.8/site-packages/gruut/__main__.py", line 641, in do_phonemize_lexicon
word_pron_str = "".join(word_pron)
TypeError: can only join an iterable

synesthesiam commented 3 years ago

Thanks! I'll get this fixed shortly.

This line needs to be changed to:

            word_pron_str = "".join(word_pron.phonemes)
mbarnig commented 3 years ago

I changed the line. It works. Thank you. But now I get another error where I am lost :

............
-der d ə ɹ
-e ə
-e- ə
Traceback (most recent call last):
File "/home/mbarnig/.local/bin/gruut", line 8, in <module>
sys.exit(main())
File "/home/mbarnig/.local/lib/python3.8/site-packages/gruut/__main__.py", line 90, in main
args.func(config, args)
File "/home/mbarnig/.local/lib/python3.8/site-packages/gruut/__main__.py", line 663, in do_phonemize_lexicon
print(word, pron_phonemes_str)
BrokenPipeError: [Errno 32] Broken pipe

synesthesiam commented 3 years ago

Can you give the exact command you're executing? This sounds like another part of your Unix pipeline exited.

mbarnig commented 3 years ago

I used the same command as shown in my first message :

zcat <language>_lexicon.txt.gz | gruut <language> phonemize-lexicon

In the mean time I have a better understanding of gruut and I tried the following command, without a pipe :

python3 -m gruut nl-nl phonemize-lexicon data/nl-nl_lexicon.txt.gz

This works as expected for all supported languages.

Example:

gruut-phonemize-lexicon-ok

If I add a pipe, for example

python3 -m gruut nl-nl phonemize-lexicon data/nl-nl_lexicon.txt.gz | head -25 .

the broken pipe error appears after printing a few lines.

I think the problem is not specific gruut related and I close the issue.

Kind regards