Could you add other language ? (i18n Support)

Paullux commented 5 years ago

I am french and I am interesting by your project, it's possible to you to use other language than english ? Could you add french for example, please ?

joshuashort commented 5 years ago

Bien sûr. Any help would be appreciated; English isn't even very well supported yet (#18), and there are many languages to add :)

To start on something like French support, you can put whatever you want as the command phrase, et voilà. The current design definitely has customization gaps though..

So, writing some French words on a command is easy enough, but it's also necessary to give pocketsphinx what it needs to 'listen' for French sounds. It'd have to be set up with the right phonemes, glyphs, and whatnot.

For anyone interested, pocketsphinx is what handles speech recognition in OA, and it's its own complex thing. To natively support French (let alone a multilingual agent), it'd require doing things that I only vaguely understand. Fortunately, it's an accessible project with great documentation and tools.

OA generates a corpus of terms that it finds attached to commands, and that'll work fine enough, but it uses that corpus to generate a language model and a phonetic dictionary. The language model is pretty trivial, and it will probably work as-is. The phonetic dictionary is where it gets tricky..

Anyway, yes, we should support all kinds of languages and vocalizations.. there are many improvements left. Thanks for the interest -- out of curiosity, are you able to run OA?

joshuashort commented 5 years ago

Other bugs aside, here's what I just tried:

Added a command to minds/boot.py

@command("ça va")
def mhm():
  say("Très bien!")

As expected, no problems with the corpus or language model (well, no new problems).

The phonetic dictionary wasn't too far off:

VA  V AA
ÇA  AH

I changed it (and restarted OA):

VA  V AA
ÇA  S AH

So, boot mind can say "très bien" like an English-speaking computer on my system when it hears this approximation of "ça va". Getting the speech right is a whole other thing..

Paullux commented 5 years ago

I found the french dic in CMUSphinx SourceForge ( https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/French/fr.dict/download)

I used it in oa/cache/boot and in oa/cache/root, it seams worked, not totally.

To change Boot Config : -change oa-core-master/oa/modules/mind/minds/boot.py -change oa-core-master/oa/cache/boot/dic -change oa-core-master/oa/cache/boot/lm -change oa-core-master/oa/cache/boot/sentences.corpus To change Root Config : -change oa-core-master/oa/modules/mind/minds/root.py -change oa-core-master/oa/cache/root/dic -change oa-core-master/oa/cache/root/lm -change oa-core-master/oa/cache/root/sentences.corpus

The seams works when I change this all file...

How I can change synthesis voice to pico ? In French, espeak it's horrible...

Paullux commented 5 years ago

I can add order, but order aren't completly understand, how i can re-use cmuxsphinx with another dictionnary from developpement of cmuxsphinx?

joshuashort commented 5 years ago

Well, you're definitely on the right track. You shouldn't have to do anything to lm or sentences.corpus -- those will be overwritten/generated as OA loads. dic is the same way, but it'll need to be changed.

One change I've been wanting to make is disable automatic generation/find a way around using an online service.

For now, you'll need to let the files get generated, stop OA, manually add entries to dic, then re-run OA.. and hope they don't get overwritten.

I don't have much experience with running a fuller language model.. I'm guessing it'll add plenty of false positives and a harder time matching, but I'm not sure.

joshuashort commented 5 years ago

What do you mean they aren't completely understood? You've added commands, but OA doesn't seem to recognize the words and trigger the commands? Or it's not accurate?

Paullux commented 5 years ago

OpenAssistant mixes the keywords. It does not understand everything at once. I have to repeat and the results are often wrong.

Paullux commented 5 years ago

OA write word in terminal, what i add in some order but not the really good by sound and not in the right way. For example : when i said "donne moi les nouvelles" i see write in terminal "LANCE LANCE NOUS"...

Paullux commented 5 years ago

to show you https://youtu.be/HYsGUvyb7Eg

joshuashort commented 5 years ago

Ok, yeah, that makes sense.

I’ve been looking into the ear module, and it’s one place that needs some improvement. I added some logging to provide some visibility on what it does.. and depending on the settings, it can differ pretty wildly on how well it can recognize phrases/silence. I’ll get a MR worked up soonish that use these changes.. but I also don’t want to flood the normal log.

And some user-facing feedback would be great. There are states that ear gets into where it’s basically ignoring speech or what’s being spoken is falling between recognition phases. And setting tend to vary between headset/internal mic and what kind of ambient noise there is (e.g. fans).

Even if speech gets recognized, it’s expecting a full phrase match.. that’s a bigger change, but it should move to intent-based interpretation that can span multiple phrases.

joshuashort commented 5 years ago

I made that last comment right before your link. Thanks for sharing that video! That’s a very cool setup you have!

Impressive results, but the recognition is very frustrating :/

It’s likely that it’s mostly an audio/ear module problem vs. a language/recognizer problem, so that’s nice.

joshuashort commented 5 years ago

One thing to verify is the dic file you’re using.. I’d avoid a full-language one while debugging, so are you using one that only has the few words that are actually used in commands? Also, check on the phonetic mapping of the words to make sure they seem like how you’d expect.

To explore a little deeper, check out modules/ear/__init__.py.. there are some configuration settings in there that might make a difference. Specifically ones related to energy threshold or timeouts. I’m on mobile right now, but I’ll try to get a link.

joshuashort commented 5 years ago

Without some logging, it’s hard to know what values to use and what the detected levels are.. but maybe try 1000 or 2000 for the energy threshold: https://github.com/openassistant/oa-core/blob/30f02f70c6599fb73617483a9d3f708e0db80c4c/oa/modules/ear/__init__.py#L30

Too low will give false recognitions from background noise.

Paullux commented 5 years ago

To help me i found this site http://www.speech.cs.cmu.edu/tools/lmtool-new.html , i can use it or i don't never changer file in 'cache' folder.

joshuashort commented 5 years ago

I think that’s the service the speech recognizer used to update: https://github.com/openassistant/oa-core/blob/30f02f70c6599fb73617483a9d3f708e0db80c4c/oa/modules/speech_recognition/__init__.py#L55

But running it outside of OA might be easier for now.

openassistant / oa-core

Could you add other language ? (i18n Support) #19