Closed the01 closed 5 years ago
please check out
https://github.com/gooofy/py-nltools/blob/master/tests/test_sequitur.py
which tests sequitur g2p for umlauts
Thanks for the hint. The test does indeed fail for me (Python 2.7.14 :: Anaconda, Inc.)
AssertionError: u'' != u"'g\u025blb-za\u026a-d\u0259-n\u0259n"
+ 'g\u025blb-za\u026a-d\u0259-n\u0259n
It seems to have problems with the encoding somehow. I ended up replacing the misc.run_command with delegator.py and got it working.
On a side note: I also switched to supplying the words to g2p.py via stdin (g2p.py --apply -
). That way, I don't have to constantly create new temporary files.
are you using the latest 20190113 sequitur model?
http://goofy.zamia.org/zamia-speech/g2p/
I dimly recall I changed the grapheme encoding to utf at some point. Also make sure you use the latest py-nltools code (which passes the utf8 encoding argument to sequitur).
The sequitur model does not seem able to deal with words containing umlauts (e.g. "fünfzig") when I call
sequitur_gen_ipa()
directly and I couldn't determine how zamia-speech deals with this. Is there a way to autmatically generate phonemes for words with umlauts?