Open Paullux opened 5 years ago
Bien sûr. Any help would be appreciated; English isn't even very well supported yet (#18), and there are many languages to add :)
To start on something like French support, you can put whatever you want as the command phrase, et voilà. The current design definitely has customization gaps though..
So, writing some French words on a command is easy enough, but it's also necessary to give pocketsphinx what it needs to 'listen' for French sounds. It'd have to be set up with the right phonemes, glyphs, and whatnot.
For anyone interested, pocketsphinx is what handles speech recognition in OA, and it's its own complex thing. To natively support French (let alone a multilingual agent), it'd require doing things that I only vaguely understand. Fortunately, it's an accessible project with great documentation and tools.
OA generates a corpus of terms that it finds attached to commands, and that'll work fine enough, but it uses that corpus to generate a language model and a phonetic dictionary. The language model is pretty trivial, and it will probably work as-is. The phonetic dictionary is where it gets tricky..
Anyway, yes, we should support all kinds of languages and vocalizations.. there are many improvements left. Thanks for the interest -- out of curiosity, are you able to run OA?
Other bugs aside, here's what I just tried:
Added a command to minds/boot.py
@command("ça va")
def mhm():
say("Très bien!")
As expected, no problems with the corpus or language model (well, no new problems).
The phonetic dictionary wasn't too far off:
VA V AA
ÇA AH
I changed it (and restarted OA):
VA V AA
ÇA S AH
So, boot mind can say "très bien" like an English-speaking computer on my system when it hears this approximation of "ça va". Getting the speech right is a whole other thing..
I found the french dic in CMUSphinx SourceForge ( https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/French/fr.dict/download)
I used it in oa/cache/boot and in oa/cache/root, it seams worked, not totally.
To change Boot Config : -change oa-core-master/oa/modules/mind/minds/boot.py -change oa-core-master/oa/cache/boot/dic -change oa-core-master/oa/cache/boot/lm -change oa-core-master/oa/cache/boot/sentences.corpus To change Root Config : -change oa-core-master/oa/modules/mind/minds/root.py -change oa-core-master/oa/cache/root/dic -change oa-core-master/oa/cache/root/lm -change oa-core-master/oa/cache/root/sentences.corpus
The seams works when I change this all file...
How I can change synthesis voice to pico ? In French, espeak it's horrible...
I can add order, but order aren't completly understand, how i can re-use cmuxsphinx with another dictionnary from developpement of cmuxsphinx?
Well, you're definitely on the right track. You shouldn't have to do anything to lm
or sentences.corpus
-- those will be overwritten/generated as OA loads. dic
is the same way, but it'll need to be changed.
One change I've been wanting to make is disable automatic generation/find a way around using an online service.
For now, you'll need to let the files get generated, stop OA, manually add entries to dic
, then re-run OA.. and hope they don't get overwritten.
I don't have much experience with running a fuller language model.. I'm guessing it'll add plenty of false positives and a harder time matching, but I'm not sure.
What do you mean they aren't completely understood? You've added commands, but OA doesn't seem to recognize the words and trigger the commands? Or it's not accurate?
OpenAssistant mixes the keywords. It does not understand everything at once. I have to repeat and the results are often wrong.
OA write word in terminal, what i add in some order but not the really good by sound and not in the right way. For example : when i said "donne moi les nouvelles" i see write in terminal "LANCE LANCE NOUS"...
to show you https://youtu.be/HYsGUvyb7Eg
Ok, yeah, that makes sense.
I’ve been looking into the ear
module, and it’s one place that needs some improvement. I added some logging to provide some visibility on what it does.. and depending on the settings, it can differ pretty wildly on how well it can recognize phrases/silence. I’ll get a MR worked up soonish that use these changes.. but I also don’t want to flood the normal log.
And some user-facing feedback would be great. There are states that ear gets into where it’s basically ignoring speech or what’s being spoken is falling between recognition phases. And setting tend to vary between headset/internal mic and what kind of ambient noise there is (e.g. fans).
Even if speech gets recognized, it’s expecting a full phrase match.. that’s a bigger change, but it should move to intent-based interpretation that can span multiple phrases.
I made that last comment right before your link. Thanks for sharing that video! That’s a very cool setup you have!
Impressive results, but the recognition is very frustrating :/
It’s likely that it’s mostly an audio/ear module problem vs. a language/recognizer problem, so that’s nice.
One thing to verify is the dic
file you’re using.. I’d avoid a full-language one while debugging, so are you using one that only has the few words that are actually used in commands? Also, check on the phonetic mapping of the words to make sure they seem like how you’d expect.
To explore a little deeper, check out modules/ear/__init__.py
.. there are some configuration settings in there that might make a difference. Specifically ones related to energy threshold or timeouts. I’m on mobile right now, but I’ll try to get a link.
Without some logging, it’s hard to know what values to use and what the detected levels are.. but maybe try 1000 or 2000 for the energy threshold: https://github.com/openassistant/oa-core/blob/30f02f70c6599fb73617483a9d3f708e0db80c4c/oa/modules/ear/__init__.py#L30
Too low will give false recognitions from background noise.
To help me i found this site http://www.speech.cs.cmu.edu/tools/lmtool-new.html , i can use it or i don't never changer file in 'cache' folder.
I think that’s the service the speech recognizer used to update: https://github.com/openassistant/oa-core/blob/30f02f70c6599fb73617483a9d3f708e0db80c4c/oa/modules/speech_recognition/__init__.py#L55
But running it outside of OA might be easier for now.
I am french and I am interesting by your project, it's possible to you to use other language than english ? Could you add french for example, please ?