wolfmanstout / talon-gaze-ocr

Talon scripts to enable advanced cursor control using eye tracking and OCR.
50 stars 25 forks source link

Feature request: integrate accessibility API as another source of labels and text #16

Open paj80paj opened 1 year ago

paj80paj commented 1 year ago

would like to request the capability of combining the capabilities of home-row + talon + gaze ocr using ordinary webcams. I’m thinking that home row would make the problem of eye tracking much easier by giving a discrete set of options. Eye movement could then result in a different letter pair label being highlighted - then you could say A specific command to click on that label.

https://github.com/dexterleng/homerow#user-guide

wolfmanstout commented 1 year ago

Thanks for sharing this idea, let me make sure I understand. Looks like Homerow offers at least a couple modes of use: one where you show two-letter labels and another where you type names of things and onscreen text. I’d say my package works very much like the latter already: it snaps the mouse clicks directly onto the text that you speak, so it would work well even with a lower-precision eye tracker. It doesn’t have labels for named icons the way that Homerow does, but I could add this by integrating with the accessibility API (something I’d like to do long term; I totally agree it would work well with this). Is that what you’re looking for?

Sounds like you are also suggesting adding support for two letter labels. At that point I don’t think you’d even need the eye tracker, right? You could use Homerow and Talon for that today, you don’t need my package.

Sounds like you’re also suggesting adding eye tracking support using only a webcam. Do you have examples of this in the wild?

paj80paj commented 1 year ago

My comment about the two letter labels was only to describe how home row currently works. I guess I'm saying that the accessibility API is complementary to your ocr - and I'm wondering whether it could improve the accuracy of OCR-gaze for that platform. One way to put this is that the API offers some ground truth that could be used to calibrate any model.

wolfmanstout commented 1 year ago

Yes, absolutely. I'm not planning on picking this up soon but I do agree that this would be a nice extension to the package, to integrate accessibility API as another source of labels and text.