Closed LexiconCode closed 4 years ago
Thanks for the detailed report! I didn't even know about the existence of these alternative ways of invoking Python ... can you point me to where I can learn more about those? Sounds like a really nice way to avoid having to completely restart Dragon in order to reinitialize Python ...
Since I don't exactly know how those work, I'll instead share some generic information about how my system works in case this sparks any debugging ideas:
can you point me to where I can learn more about those? Sounds like a really nice way to avoid having to completely restart Dragon in order to reinitialize Python ...
Not having to restart Dragon is really nice. The Dragonfly (CLI) documentation. Behind-the-scenes uses natlink.waitForSpeech()
to run the grammars out of process on their own thread. Another way to achieve this besides the CLI is use Dragonflys engine loaders like dfly-loader-natlink.py Both of these methods do not launch "messages window". Not the official way to run Natlink but as you can see it has some significant advantages.
The only issue I've noticed running out of process importing win32ui
freezes Natlink if used out of process. The freeze manifests itself through an absence of recognition through Dragon and natlink's waitForSpeech gui freezes. Ultimately natlink's process has to be ended using to the task manager. After the process is terminated DNS resumes recognition.
There might be a simpler way to make your workaround with threading issues with Natlink. Dane included a helper function to simplify to managing threads. See here for documentation and source
I will get back to with further debugging.
* What is the typical failure mode for the second and third methods you described? Is it like the video you sent me separately, where the cursor does move ... but to the wrong place?
The cursor placement is off and the caret with click and select commands. This by and large seems to be the main mode of failure and is 100% repeatable on my end. So it turns out with click and select failure the behavior is the same as demonstrated in the the video. For example target phrase "susceptible" for both commands the caret with click and select commands.
Start<caret start>
This is a test
Being human makes us susceptible words ceptib1le to the onset of feelings.
This is a test
<caret end>
for utterance the words susceptible
(I took note to hold my gaze the significant pause before and end utterance)
Gaze point: (1618.846591, 279.274395) Mouse Move[1618, 254] Mouse Move[1718, 254]
for utterance the susceptible click
Gaze point: (1661.408367, 272.171501) Mouse Move[1672, 254]
I suspected in the example above there wasn't enough text to be highlighted or clicked if the difference was too large to the given body of text. Another way to visualize the issue. Below demonstrates an action selecting text with the target word "susceptible".
So there seems to be a difference in number of character between the target word and selected characters . Just another data point that might be helpful.
susceptible
is 11 characters long
4444444444444
is 13 characters long
The offset of how many lines down from the target word grows per utterance of the command for selection. See vid
susceptible click
the same methodology as the video above with 3 utterances of the command.
Gaze point: (1551.106345, 344.777151) Mouse Move[1551, 353]
Gaze point: (1599.236974, 538.596187) Mouse Move[1552, 573]
Gaze point: (1522.033729, 736.742983) Mouse Move[1551, 793]
Functions to print coordinates
class Mouse(object):
def move(self, coordinates):
print(("Mouse Move[{}, {}]".format(*coordinates)))
def get_gaze_point_or_default(self):
if self.has_gaze_point():
print("Gaze point: (%f, %f)" % self._gaze_point[:2])
I wanted to separate out Natlink out of the speech recognition stack. This should simplify the stack as it's just Dragonfly and your code. I did this using Dragonfly's text engine. The text engine uses Mimic to emulate spoken words as if they are recognized as a spoken utterance.
Running Test Engine from Dragonfly CLI for gaze-ocr.py (your grammar).
python -m dragonfly test _gaze-ocr.py --delay 3
Type commands to emulate as if they are being dictated by voice.
lowercase mimics `commands`, UPPERCASE mimics `free dictation`
Upper and lowercase words can be mixed e.g `say THIS IS A TEST`
Edit the `--delay 3` in bat file to change command delay in seconds.
The delay allows user to switch to the relevant application to test commands
words SUSCEPTIBLE
resultINFO:module:CommandModule('_gaze-ocr.py'): Loading module: 'C:\Users\Main\Desktop\_gaze-ocr.py'
Eye tracker connected.
INFO:command:Calls to mimic() will be delayed by 3.00 seconds as specified
INFO:command:Enter commands to mimic followed by new lines.
words SUSCEPTIBLE
Gaze point: (1757.592047, 287.165824)
mouse move[1686, 295]
mouse move[1793, 295]
INFO:command:Mimic success for words: words SUSCEPTIBLE
words SUSCEPTIBLE
Gaze point: (1775.471045, 473.689781)
mouse move[1686, 515]
mouse move[1794, 515]
INFO:command:Mimic success for words: words SUSCEPTIBLE
words SUSCEPTIBLE
Gaze point: (1768.751352, 726.764728)
mouse move[1686, 735]
mouse move[1793, 735]
INFO:command:Mimic success for words: words SUSCEPTIBLE
SUSCEPTIBLE click
result
SUSCEPTIBLE click
Gaze point: (1703.507924, 290.138132)
mouse move[1739, 295]
INFO:command:Mimic success for words: SUSCEPTIBLE click
SUSCEPTIBLE click
Gaze point: (1711.555348, 491.819858)
mouse move[1739, 515]
INFO:command:Mimic success for words: SUSCEPTIBLE click
SUSCEPTIBLE click
Gaze point: (1715.925631, 675.288658)
mouse move[1740, 735]
INFO:command:Mimic success for words: SUSCEPTIBLE click
The behavior is very close to being identical. I've also tested with and without Force NatLink to schedule background
relevant code when utilizing the text engine.
Thanks again for the detailed information. To confirm: everything works fine when running the standard way, but not via the CLI methods?
Based on the coordinates you printed, it looks like the clicked locations are pretty well aligned with the eye tracking locations, which is taken directly from an API I don't control. That means that both of these agree on the frame of reference, but both are misaligned with the actual screen contents.  The question, then, is what is causing this shift. Do you have a second monitor, or perhaps something docked on the screen that could cause this? If this is indeed only happening with the CLI version, I wonder whether the coordinates are anchored to your command prompt window location for some reason? You could try moving the location of that window around and see if that influences the results.
To confirm: everything works fine when running the standard way, but not via the CLI methods?
Yes that's correct.
You could try moving the location of that window around and see if that influences the results.
Do you have a second monitor, or perhaps something docked on the screen that could cause this?
If it would help to have access to the machine to get a first-hand experience if you can't replicate it yourself you're more than welcome. We could arrange a time and a remote access method over Gitter
I have a theory: perhaps the Mouse action is misbehaving in this configuration, and clicking on a location relative to the active window. That seems more likely than two different APIs both misbehaving the same way. Can you test some simple Mouse actions such as [0, 0] using this config? That should be the top left corner of the screen. If that doesn't reveal the issue then it'd help to have more details on what happens when you move the foreground window.
Also happy to debug remotely this weekend and/or try to repro your setup.
Mouse actions such as [0, 0] using this config? That should be the top left corner of the screen.
Mouse("[0, 0]").execute()
works as expected
Top Right
Bottom Right
Top Left
Bottom Left
I was able to reproduce this and figure out what was going on, and it was far more confusing than I ever could have guessed! The cause of the problem is Windows text scaling: if you set this to 100% you won't see any issues. As it turns out, this is broken in both method 1 and method 2, but in different ways for each! (I didn't bother testing method 3.) Here's what's going on:
Method #1: The eyetracker is returning scaled-down coordinates, the screenshot is at full scale, and the Mouse action works as you would expect (no adjustment to the coordinates it is given). Because these last two are synchronized, this mostly works ... unless you position the window towards the bottom left of the screen, in which case the cropped region of the screenshot (which is based on the eyetracker) will be offset enough that it won't include the text you want to select. Method #2: The eyetracker is returning scaled-down coordinates, the screenshot is at full scale, and the Mouse action scales up absolute coordinates it is given in proportion to text scaling. Hence, this has the same failure mode as method 1, but in addition it will also incorrectly position the cursor due to the scaling up of the coordinates it is given.
So, the test with [0, 0] I suggested earlier didn't reveal anything because that's the one coordinate that's not affected by scaling!
I was able to fix this for #1 by adjusting the eyetracker scaling by comparing the screen bounds it reports with what comes from the dragonfly Monitor class. That's now checked into my repository (not yet pushed out -- I'm holding off for a larger release). #2 is still broken, however, for two reasons: (1) the Monitor class returns the wrong size so the eye tracker coordinates are not scaled correctly and (2) the Mouse action still behaves inconsistently with #1. Also, even for use case #1, all bets are off if you adjust scaling after starting everything up. The behavior is extremely bizarre: the desktop resolution from Monitor gets reported incorrectly after adjusting the scale! After seeing all this, I now have full sympathy for any application on Windows that doesn't work properly with text scaling, sadly...
In other news, thank you again for sharing these other methods of loading grammars! What's the advantage of method #3 vs. #2? Even with #2 I was able to pick up changes to Python modules made outside of my grammar. I did, however, notice that some grammars did not work properly in this mode (e.g. saying "number one three" was treated as dictation instead of typing "13", as I've set it up to do).
I removed my threading hack entirely so I can test whether Dane's fixes are enough. Seems to be working fine.
From reading up some more on how Windows handles high-DPI situations, it looks like the issue is that when Python is run from the command line, Windows treats it as a "high-DPI unaware" application versus when it is embedded in Dragon it is treated as "high-DPI aware" (perhaps inheriting the context from Dragon itself). Hence the core Windows APIs will behave differently. I am able to force the latter case to be DPI aware using the following:
import ctypes
...
ctypes.windll.user32.SetProcessDPIAware()
So far this seems to work fairly well except it is still broken if you try to resize after your grammar has been loaded. After I test this for a little while, I will probably submit this as a pull request to be run inside Dragonfly so that the behavior is consistent. Here's a lovely example of just how misleading these APIs are (and note that this post is in 2015 and there is still no information on either of these APIs as to how they behave relative to a high-DPI display): https://social.msdn.microsoft.com/Forums/sqlserver/en-US/2dc1648d-a731-49f2-8ae5-d486644a62fb/suggestion-for-setcursorpos-and-high-dpi-displays?forum=windowsgeneraldevelopmentissues
I think this is about "as fixed as it is going to get" now with my latest change. Here's what I will likely submit as a pull request unless I run into issues: https://github.com/dictation-toolbox/dragonfly/compare/master...wolfmanstout:dpi_awareness?expand=1
There are multiple modes of DPI awareness, and this is the "most aware". This means that when running from the command line (method 2), you can even change the DPI multiple times and everything should work properly. It appears that one cannot override DPI awareness when running embedded in Dragon, so in that scenario you are limited to awareness of DPI at startup time only.
Fixed with commit 75440ed5e6d18dae571f1f086164b8d239c43f41
I'll leave this open until the Dragonfly pull request is submitted, since that is required to address your original concern.
I think this is about "as fixed as it is going to get" now with my latest change. Here's what I will likely submit as a pull request unless I run into issues: dictation-toolbox/dragonfly@
master...wolfmanstout:dpi_awareness
?expand=1 (compare)There are multiple modes of DPI awareness, and this is the "most aware". This means that when running from the command line (method 2), you can even change the DPI multiple times and everything should work properly. It appears that one cannot override DPI awareness when running embedded in Dragon, so in that scenario you are limited to awareness of DPI at startup time only.
This reminds me of an issue we face with Mouse Grids in Caster. https://github.com/dictation-toolbox/Caster/issues/172
error_code = windll.shcore.SetProcessDpiAwareness(2) #enable 1-1 pixel mapping
if error_code == -2147024891:
raise OSError("Failed to set app awareness")
https://docs.microsoft.com/en-us/windows/win32/api/shellscalingapi/
Perhaps I can use your screen OCR package to reimplement Legion grid that is cross-platform. Just need to extract the bounding boxes.
I will test your changes in the next few days.
I did, however, notice that some grammars did not work properly in this mode (e.g. saying "number one three" was treated as dictation instead of typing "13", as I've set it up to do).
That's good to know! So far have only seen us like handful of people who have had issues with the dictation element. See https://github.com/dictation-toolbox/dragonfly/issues/242. The other issue DictListRef
doesn't seem to work without a process. You asked if there was any advantages with method #3 vs. #2. The primary advantage of #3 do away with the Natlink messaging window GUI. in Caster once Python 3 is released with nalink is it allows me to integrate a GUI that is much more advanced. The implementation shows if utterances are recognized as dictation/commands and displays available commands. It would be nice to have it as a standalone be used with any grammar framework but it's integrated with casters CCR system.
Long term it would be nice if CCR was integrated into dragonfly.
I've tested your changes with the gaze-ocr
and dragonfly
libraries and everything works as expected. Thanks for all your help with this. This issue that we resolved here also highlight and fix bug in the Legion mousegrid. The bug had the same underlying issue with "high-DPI unaware" cmd with dragonfly CLI.
I'm testing this out on Python 3.8.2 32-bit.
I've tried several different ways to run this. The 2nd and 3rd methods do not function reliably. Is anything further I can do to help troubleshoot?
Running the grammar through natlink through the traditional method works correctly
_gaze-ocr.py
. (in-process method.)python -m dragonfly load --engine natlink gaze-ocr.py --no-recobs-messages
in-process method. The OCR seems to be working correctly marked as success and does indeed have a correct gaze location. However no text is highlighted. ocr_data.zippython -m dragonfly load --engine natlink gaze-ocr.py --no-recobs-messages
out-of-process method. (grammars on its own thread)Now what's interesting here several different behaviors appear.
The OCR seems to be working correctly marked as success and does indeed have a correct gaze location. However no text is highlighted. ocr_data.zip success
On occasion regardless of repeated tests it fails to create a selection and
Execution failed: SelectTextAction()
is produced. Examining OCR data it does not reflect the gaze at the time of recognition. (Verified through Gaze Trace). In fact no matter where I look for (e.g. 4 corners of the screen) the coordinates of the gaze snapshot seems to be the same location. This is despite a significant pause after dictation.