Open Holzhaus opened 10 years ago
Not terribly worried about the performance penalty--at this small scale, I don't think it will be a huge issue (but I could be wrong).
For starters: an even less ambitious goal would be to provide an easier way to configure PocketSphinx and g2p to work with other languages (this will need to be done regardless) and then let users just write their plugins in the other language.
A few notes:
isPositive
and isNegative
functions provided by app_utils.py, nor will it be able to parse dates and times into datetime
objects using semantic.py.I think this is probably a good idea regardless of the multi-lingual business, though.
I would like to help for this issue because I need to translate the software in my language but I want to code something every translator may contribute to. My first thought was to create group of word and phrases saved as constant at the beginning of every module or even outside the module where would be easier to read for non technical people. When the module runs it will load phrases based on profile attribute.
After reading #280 I do understand your view for 2.0 milestone is way more complete than just enabling multi-language. I hope I can help you to boost the evolution of this software while reaching my own goal. I'm a professional developer although never programmed Python I'm willing to help if you could use a hand. Let me know
Initial multilanguage support is in PR #383 (work-in-progress), although you can already test it by adding this to your profile.yml
:
language: 'de-DE' # default is 'en-US'
stt_engine: google # That's only STT engine supporting german at the moment
tts_engine: google-tts # ivona-tts will work too
The only plugin that currently has german translations is the clock
plugin. You can trigger it by saying TIME
(if language is en-US
) or UHRZEIT
(if language is de-DE
).
That PR is still word-based, I'll possibly add the phrase-based parsing in a different PR.
I assume now the project only supports English?
German works too when using the jasper-dev
branch (experimental). Other languages can be added by adding translations in the po
files.
Hi, I just tested with the jasper-dev
branch and with
language: 'de-DE' # default is 'en-US'
stt_engine: google # That's only STT engine supporting german at the moment
tts_engine: google-tts # ivona-tts will work too
But it does not work.
I get the following error:
WARNING:jasper.application:Plugin 'unclear' skipped! (Reason: Unsupported Language!)
INFO:jasper.application:weather
WARNING:jasper.application:Plugin 'weather' skipped! (Reason: Unsupported Language!)
INFO:jasper.application:gmail
WARNING:jasper.application:Plugin 'gmail' skipped! (Reason: Unsupported Language!)
INFO:jasper.application:clock
WARNING:jasper.application:Plugin 'clock' skipped! (Reason: Unsupported Language!)
INFO:jasper.application:life
WARNING:jasper.application:Plugin 'life' skipped! (Reason: Unsupported Language!)
INFO:jasper.application:mpdcontrol
WARNING:jasper.application:Plugin 'mpdcontrol' skipped! (Reason: Unsupported Language!)
INFO:jasper.application:notifications
WARNING:jasper.application:Plugin 'notifications' skipped! (Reason: Unsupported Language!)
INFO:jasper.application:birthday
WARNING:jasper.application:Plugin 'birthday' skipped! (Reason: Unsupported Language!)
INFO:jasper.application:hn
WARNING:jasper.application:Plugin 'hn' skipped! (Reason: Unsupported Language!)
INFO:jasper.application:news
WARNING:jasper.application:Plugin 'news' skipped! (Reason: Unsupported Language!)
INFO:jasper.application:lightcontrol
WARNING:jasper.application:Plugin 'lightcontrol' skipped! (Reason: Unsupported Language!)
INFO:jasper.application:joke
WARNING:jasper.application:Plugin 'joke' skipped! (Reason: Unsupported Language!)
ERROR:jasper.application:No plugins for handling speech found!
Traceback (most recent call last):
File "./Jasper.py", line 5, in <module>
jasper.main()
File "/home/pi/jasper/jasper/__main__.py", line 31, in main
app = application.Jasper(use_local_mic=p_args.local)
File "/home/pi/jasper/jasper/application.py", line 188, in __init__
raise RuntimeError(msg)
RuntimeError: No plugins for handling speech found!
And with --debug
on I get the following (only clock plugin to keep it simple):
INFO:jasper.application:clock
WARNING:jasper.application:Plugin 'clock' skipped! (Reason: Unsupported Language!)
Traceback (most recent call last):
File "/home/pi/jasper/jasper/application.py", line 175, in __init__
plugin = info.plugin_class(info, self.config)
File "/home/pi/jasper/jasper/plugin.py", line 37, in __init__
self, self.info.translations, self.profile)
File "/home/pi/jasper/jasper/i18n.py", line 28, in __init__
self.__get_translations()
File "/home/pi/jasper/jasper/i18n.py", line 37, in __get_translations
raise ValueError('Unsupported Language!')
ValueError: Unsupported Language!
The .po
files are there so I do not understand what is going wrong. I'm not a python programmer but I get the basic idea of how it works. But I can't seem to figure this one out.
Hope you can help
Hi, I realized I forgot something. When jou change or add language .po files you need to run the compile_translations.sh script. After that it works fine.
And to be able to run the compile_translations.sh script you need to install gettext
sudo apt-get install gettext
You'll probably also get a 403 error from google translate. To fix that install the latest version of gTTS
sudo pip install --upgrade gTTS
After that everything should work fine :-)
How about multi-language support? Language could be made configurable in
profile.yml
or by using thelocale
module. But how to translate the plugin vocabulary?I suppose that something like
gettext
can be applied tomodule.WORDS
, but unfortunately, the grammar is hardcoded in modules, too.A possible solution
Step 1: Using phrases instead of words
We could use a list of possible phrases instead of a list of words in each module. With this approach, whole phrases will be translated and thus the grammar will still be correct:
Step 2: Use variables in phrases
But what if I want to do something like:
The current (word-based) approach
With the current system, I would do something like this:
But unfortunately, this is not translateable and a pain to parse.
The phrase-based approach
But how to do that with phrases? Probably with
str.format()
placeholders:Sample output
Step 3: How to parse?
First we need to transform the base phrases into something that can be matched against another string. Unfortunately, Format strings are not matchable out of the box (at least I think so), but we can archieve that by using regexes.
Converting base phrases to regexes
Matching input phrases against regex phrases
Now we can match our phrase against the regex phrases and even extract the interesting values from them:
Step 4: Getting back from regex to base phrase
This is fairly easy: just match the regex on the base phrases.
Step 5: Connecting actions to matched phrases
We just replace the list
BASE_PHRASES
with a listACTIONS
that contains tuples(base_phrase, action)
, whereaction
is actually a callable object (function, etc.). Of course, the above methods need to be changed accordingly.Step 6: A working example
I provided a proof-of-concept implementation here.
Conclusion
In my opinion, this would not only give plugin developers to parse input easily, but also offers the chance to translate phrases and implement support for different languages. It also makes it possible to parse the base phrases in a way so that we can generate a grammar-based language model (I'm not an expert, but I think so). The big con is the performance penalty because of the regex stuff, but I think it's worth it.
What do you think?