t4ngo / dragonfly

ARCHIVED! - Speech recognition framework allowing powerful Python-based scripting and extension of Dragon NaturallySpeaking (DNS) and Windows Speech Recognition (WSR)
GNU Lesser General Public License v3.0
364 stars 82 forks source link

Dragonfly does not support creating SAPI5 InProc recognition contexts #3

Closed t4ngo closed 9 years ago

t4ngo commented 10 years ago

This bug report and patch has been taken from Google Code issue 15, reported by acampbell@ltufz.com on 2013-04-10.

What is the issue? Dragonfly's SAPI5 engine does not provide a facility to connect to an in-process recognition context. This means that any grammars that dfly uses is in addition to the Windows default grammars, which made the project less than ideal for testing purposes.

What version of the product are you using? On what operating system? read-only trunk as of 2013-04-09

How do you propose to fix the issue? I've attached a patch that fixes this and another issue.

  1. Split Sapi5Engine into Sapi5InProcEngine and Sapi5SharedEngine.
  2. The Sapi5InProcEngine inherits from Sapi5SharedEngine, extends the init to EnsureDispatch for SpInProcRecognizer, and Connect was overridden to set the recognizer to InProc and select an audio source.
  3. Since the InProc Recognizer requires an audio source, a method was added to allow selection of an audio source by various criteria. At the moment, it either get a specific audio source index (defaults to the first audio source) or searches for text in the audio source description (eg, Realtek)
  4. The Sapi5Engine now inherits from Sapi5SharedEngine as a passthrough (so the behavior does not change)
  5. During testing, I found that /examples/dfly-loader-wsr.py referenced get_sapi5_engine, which was no longer available, so I changed it to instantiate Sapi5Engine() instead.

I ran the tests and attached the outputs. There weren't any failures that weren't in a reference checkout of the source.

CodeOptimist commented 9 years ago

Please do in fact add this officially. Using the InProc recognition context on Windows means all the power of the built-in speech recognition engine and training, without all the stupid "minimize this, run notepad, type this" accessibility crap going ballistic.

I was using this setup with my own C# DLL and AutoHotkey until I discovered acampbell's dragonfly patch, and now I use dragonfly and Python with the AutoHotkey DLL (pyahk package). (AHK integration provides much more power and flexibility than dragonfly's commands and macros when it comes to Windows.)

It's bliss. The control is limitless. I've been running it for a couple years straight with a special trigger word setup that's biased toward false-negatives rather than false-positives (I have to speak very clearly to trigger it but it never goes off on its own.) It listens continuously while I chat, stream movies and shows, and never messes up. And when it comes time for bed: "computer, monitors off" and it's done. It's like Star Trek Enterprise up in here. But to simplify this for others his patch really needs to go official.

Actually his patch has a slight flaw, EnsureDispatch("SAPI.SpSharedRecognizer") should not run with the InProcEngine because this opens the Windows accessibility voice control GUI, which is completely superfluous in this scenario and only causes issues (it turns itself on when the recording device goes missing - bad!). I merely commented this line out since I use the InProcEngine exclusively, but it needs to be done properly.

I don't see his patch here ... though it's attached to that Google Code issue of course. I'm going to supply an image since that's supported.

voice

CodeOptimist commented 9 years ago

Sorry I didn't see the branch by @chilimangoes ... this is my first time on github, I'm used to Mercurial. So provided that's the same patch as acampbell provided, then we just need to fix it so EnsureDispatch("SAPI.SpSharedRecognizer") only executes for Sapi5Engine and not Sapi5SharedEngine... and then maybe it can go into an official version? Maybe? Please? I hope.

jgarvin commented 9 years ago

@CodeOptimist I'd ask you by PM but apparently github removed the feature. What's so great about the autohotkey macros that's more powerful than dragonfly? Just looking to mine ideas for my own dragonfly based project, which is on Linux actually but autohotkey may still contain ideas worth stealing :)

CodeOptimist commented 9 years ago

I updated my remark to reference Windows since I wasn't being fair otherwise. @jgarvin To answer your question, AutoHotkey just shines when it comes to the Windows OS since its express purpose is to manipulate windows, capture hotkeys, parse and manipulate window controls, replace text, and so forth. It's like friendly access to the WinAPI with excellent documentation and ease of use. I'm a big fan of "do one thing and do it well" so Python for the language itself, AutoHotkey for commands, and dragonfly for speech recognition is a beautiful combination.

chilimangoes commented 9 years ago

@CodeOptimist Would you be willing to share any more details about your setup? Like how you use your trigger word in your dragonfly grammars and examples of AHK sequences and scripts that might be useful to others? (If so, it might be good to do it in another place so as not to derail this ticket)

CodeOptimist commented 9 years ago

@chilimangoes That's the plan. I thought I would encourage the addition of this to the official version first as that would make the instructions in my upcoming blog post much simpler and the entire thing available to a less technical audience.

chilimangoes commented 9 years ago

@CodeOptimist Sounds good. One possibility might be pointing people to a forked version with the patch and then update your blog post if/when the repository here gets updated.

I haven't seen any activity from @t4ngo on this issue or a response to the pull request that I submitted. The original patch has also been around for about a year and a half. I've actually moved on from Windows Speech Recognition to Dragon NaturallySpeaking because it worked a lot better for my set up. I wonder if @t4ngo is in a similar situation and if that might be the reason why the patch hasn't been integrated into the main repository.

t4ngo commented 9 years ago

I added the SAPI5 in process recognizer in commit a7c1a72. Thanks to acampbell@ltufz.com, @chilimangoes, and others for showing the way.

@CodeOptimist, @chilimangoes: Please try it out and let me know if you have any problems using it.

CodeOptimist commented 9 years ago

@t4ngo Wooo! This is awesome! Thanks so much! It's been working great since you committed it. @chilimangoes I can get on that blog post now, well, eventually. I need to add some other blog stuff first.