wolfmanstout / talon-gaze-ocr

Talon scripts to enable advanced cursor control using eye tracking and OCR.
45 stars 22 forks source link

Don't spam so much to the log when matching fails #1

Open nriley opened 1 year ago

nriley commented 1 year ago

When talon-gaze-ocr can't find anything, it outputs a great deal of text to the log. For those of us who keep the log visible all the time this is somewhat distracting.

Here is an example:

2022-08-19 19:23:09    IO > click send
2022-08-19 19:23:09 ERROR     9:       talon/scripting/talon_script.py:589| 
    8:       talon/scripting/talon_script.py:297| 
    7:       talon/scripting/talon_script.py:556| 
    6:            talon/scripting/actions.py:85 | 
    5:              talon/scripting/types.py:411| 
    4: user/talon-gaze-ocr/gaze_ocr_talon.py:559| begin_generator(run())
    3: user/talon-gaze-ocr/gaze_ocr_talon.py:270| ambiguous_matches = next(generator)
    2: user/talon-gaze-ocr/gaze_ocr_talon.py:556| yield from move_cursor_to_word_generat..
    1: user/talon-gaze-ocr/gaze_ocr_talon.py:287| raise RuntimeError('Unable to find: "{}"'.format(text))
RuntimeError: Unable to find: "TimestampedText(text='send', start=4094.0116906937633, end=4094.3043736205927)"

[The below error was raised while handling the above exception(s)]
2022-08-19 19:23:09 ERROR cb error topic="phrase" cb=<bound method SpeechSystem.engine_event of <talon.scripting.speech_system.SpeechSystem object at 0x10e5ee950>>
   30:       lib/python3.9/threading.py:937* # cron thread
   29:       lib/python3.9/threading.py:980* 
   28:       lib/python3.9/threading.py:917* 
   27:                    talon/cron.py:155| 
   26:                    talon/cron.py:106| 
   25:                     talon/vad.py:21 | 
   24: talon/scripting/speech_system.py:356| 
   23:             talon/engines/w2l.py:709| 
   22:      talon/scripting/dispatch.py:105| 
   21:      talon/scripting/dispatch.py:144| 
   20:      talon/scripting/dispatch.py:133| 
   19:          talon/scripting/rctx.py:236| # 'phrase' main:_redispatch()
   18: talon/scripting/speech_system.py:63 | 
   17:      talon/scripting/dispatch.py:105| 
   16:      talon/scripting/dispatch.py:144| 
   15:      talon/scripting/dispatch.py:133| 
   14: ------------------------------------# [stack splice]
   13:          talon/scripting/rctx.py:236| # 'phrase' main:engine_event()
   12: talon/scripting/speech_system.py:410| 
   11:       talon/scripting/actions.py:85 | 
   10:         talon/scripting/types.py:411| 
    9:     talon/scripting/core/core.py:103| 
    8:       talon/scripting/actions.py:85 | 
    7:         talon/scripting/types.py:411| 
    6:     talon/scripting/core/core.py:132| 
    5:       talon/scripting/actions.py:85 | 
    4:         talon/scripting/types.py:411| 
    3:     talon/scripting/core/core.py:139| 
    2:  talon/scripting/talon_script.py:690| 
    1:  talon/scripting/talon_script.py:593| 
talon.scripting.talon_script.TalonScriptError: 
 in script at /Users/nicholas/.talon/user/talon-gaze-ocr/gaze_ocr.talon:75:
 > user.click_text(timestamped_prose)
RuntimeError: Unable to find: "TimestampedText(text='send', start=4094.0116906937633, end=4094.3043736205927)"
wolfmanstout commented 1 year ago

All this spam is simply the result of raising one exception, which I do as a way to safely stop execution of a command wherever I am in the code. Do you know if there's another simple mechanism to do this, like a sort of "exit" action that raises a special exception type that doesn't trigger Talon logspam? Then I could just log something terse and exit(). I guess in my case I do happen to have one place where everything rolls up to, where I could catch exceptions and log the errors (the begin_generator function).

wolfmanstout commented 1 year ago

Reverting the earlier solution, as this swallows the exception and would allow chained commands to run. Based on my conversation with Ryan above, raising an exception seems to be the best option available right now, and it also aligns with how Cursorless handles errors.

nriley commented 1 year ago

It's not quite as bad with the recent Talon beta changes.

2022-08-27 12:11:02    IO > click it go
2022-08-27 12:11:02 ERROR     7:       talon/scripting/talon_script.py:590| 
    6:       talon/scripting/talon_script.py:298| 
    5:            talon/scripting/actions.py:88 | 
    4: user/talon-gaze-ocr/gaze_ocr_talon.py:551| begin_generator(run())
    3: user/talon-gaze-ocr/gaze_ocr_talon.py:275| ambiguous_matches = next(generator)
    2: user/talon-gaze-ocr/gaze_ocr_talon.py:548| yield from move_cursor_to_word_generat..
    1: user/talon-gaze-ocr/gaze_ocr_talon.py:292| raise RuntimeError('Unable to find: "{}"'.format(text))
RuntimeError: Unable to find: "TimestampedText(text='it go', start=10588.301305670497, end=10588.722442034134)"

[The below error was raised while handling the above exception(s)]
2022-08-27 12:11:02 ERROR cb error topic="phrase" cb=SpeechSystem.engine_event
   15:       lib/python3.9/threading.py:937* # cron thread
   14:       lib/python3.9/threading.py:980* 
   13:       lib/python3.9/threading.py:917* 
   12:                    talon/cron.py:151| 
   11:                     talon/vad.py:21 | 
   10: talon/scripting/speech_system.py:362| 
    9:             talon/engines/w2l.py:709| 
    8:      talon/scripting/dispatch.py:134| # 'phrase' main:EngineProxy._redispatch()
    7: talon/scripting/speech_system.py:64 | 
    6: ------------------------------------# [stack splice]
    5:      talon/scripting/dispatch.py:134| # 'phrase' main:SpeechSystem.engine_event()
    4: talon/scripting/speech_system.py:415| 
    3:      talon/scripting/executor.py:110| 
    2:  talon/scripting/talon_script.py:691| 
    1:  talon/scripting/talon_script.py:594| 
talon.scripting.talon_script.TalonScriptError: 
 in script at /Users/nicholas/.talon/user/talon-gaze-ocr/gaze_ocr.talon:75:
 > user.click_text(timestamped_prose)
RuntimeError: Unable to find: "TimestampedText(text='it go', start=10588.301305670497, end=10588.722442034134)"

To step back a bit — there are generally two common reasons talon-gaze-ocr fails:

1) Failure to interpret what I am saying (in the above case, it inserted an "it" between "click" and "go") 2) Failure of eye tracking to interpret where I am looking

It'd be good to figure out how to surface this information, even if it's outside the log. This likely should be filed as another issue — perhaps, like with cursorless, talon-gaze-ocr should either display more detailed information in the failure overlay or use a notification.

One other thing I noticed with the above is that a recording didn't get saved, so I couldn't figure out how Talon interpreted my speech as "click it go". I wonder if this is normal in Talon when exceptions are raised? I've never checked…

wolfmanstout commented 1 year ago

Indeed, recordings are not saved when an exception is raised (I just confirmed that with Cursorless as well). I'm going to bring this up next week when I sync up with Ryan to see if that might be possible to change. That seems like the most useful thing to do from a debugging standpoint (note that I do already show the recognized ASR text at the top of the screen as part of the debugging overlay, and you can adjust the pause with user.ocr_debug_display_seconds). In terms of fixing the underlying issue, I think that the upcoming "dynamic list" feature will be the most helpful because we can bias recognition towards what is seen onscreen.

Regarding the second issue of eye tracking: I am currently using the raw gaze output from Talon. The best long term fix is probably to use Ryan's filtered/predicted gaze. I'm going to check with him if it might be possible to use that without enabling cursor control.

wolfmanstout commented 1 year ago

Also, from a logging standpoint, what did you have in mind for better understanding eye tracking failures? Currently the debugging overlay is designed to show you what it perceived. I guess ideally we would have something like a video to better understand this, or perhaps logged data that could be used to investigate dynamic search radius adjustments. Practically speaking, I think we are most likely to benefit from the in-depth work that Ryan is already doing to improve eye tracking, though.

nriley commented 1 year ago

I find the debugging overlay a bit overwhelming — trying to take in that much information with the default delay and figuring out which of the failure modes I am triggering is difficult. Maybe this will improve with practice, but for now I just treat it as a "fail" and try saying/looking again.

I guess failure to correctly OCR the onscreen text would be another thing to identify, but I've never had this happen in practice. This may be partially because I've only used talon-gaze-ocr on the Mac.

wolfmanstout commented 1 year ago

Makes sense. Consider bumping up the delay as noted above. If you want to repeat the command you can do so before the overlay has disappeared (it will be hidden before running another OCR round).