synesthesiam / voice2json

Command-line tools for speech and intent recognition on Linux
MIT License
1.09k stars 63 forks source link

Limit valid transcriptions to only a subset during each speech input phase #32

Closed ghost closed 3 years ago

ghost commented 3 years ago

First of all, sorry for posting this as an issue as it's not an actual issue, but I didn't spot another way to talk with voice2json users/contributors.

I am currently trying to control a chess game with voice2json (with pocketsphinx as the backend). A proof-of-concept does work and it recognizes my intents if I speak very clearly, but sometimes it's also slightly off. From the chess board it's clear that the recognized sentence is not a valid move, but of course the speech recognition engine cannot know that. For example the engine might recognize "Move pawn from g3 to g4", but the pawn is on g2, so a valid move would be "g2 to g4".

From tracing down how audio is transcribed it seems that already in rhasspyasr_pocketsphinx/transcribe.py we return only one recognized sentence, not the possibly second, third and so on closest recognitions.

I saw that there is the possibility to limit the intents in recognize-intent, but if I see it correctly this also has to work with the one recognized transcription from transcribe-stream and thus comes too late in the pipeline.

In my point of view there seem to be two methods how I could solve my problem:

Is one of these methods possible with voice2json or could I plug in somewhere to achieve it? I know Python, but of course if you say that it totally goes against the core of voice2json I wouldn't even try.

synesthesiam commented 3 years ago

Sorry to have to gotten back to you so late! For now, I would go with the re-training method.

In the near future, I will be investigating using dynamic grammars in the Kaldi backend (something in spirit like kaldi-active-grammars.