Feature request: commands that work with paragraphs

wolfmanstout / talon-gaze-ocr

Talon scripts to enable advanced cursor control using eye tracking and OCR.

50 stars 25 forks source link

Feature request: commands that work with paragraphs #18

Open paj80paj opened 1 year ago

paj80paj commented 1 year ago

I have a Tobii eye tracker, mac m1 and talon. Would this be able to allow me to look at a piece of text and Have the containing paragraph highlighted So that I can screen grab And OCR into text In the clipboard. I would also like the same ability but looking at a diagram And have the diagram recognised as an object and then Selected and copied to the clipboard as an image. I can trigger the copying with a Talon voice command.

wolfmanstout commented 1 year ago

These features do not currently exist. The closest would be that you could say something like "select [first word] through [last word]" to select the contents of a paragraph. But there isn't anything that automatically grabs the paragraph just based on a single word inside of it. That's a reasonable feature request. As for copying a diagram as an image, that sounds out of scope for this package because it doesn't have anything to do with OCR or text selection.

paj80paj commented 1 year ago

Thanks for the reply. Does OCR include edge detection to see the end edges of the paragraph? Could I branch it to do so?

paj80paj commented 1 year ago

Actually, I don't think it needs the edges, if it just has all the words and I can copy them to the clipboard with a talent command.

wolfmanstout commented 1 year ago

It currently only has a concept of lines, not paragraphs. You'd have to define some heuristics to group lines into paragraphs.

paj80paj commented 1 year ago

Thanks a lot for your responsiveness. Can you see on the screen which lines are identified by the OCR? When you mention the first word and last word above did you mean the first and last of the line or the paragraph?

wolfmanstout commented 1 year ago

Yes, there is a command that lets you see what OCR sees ("ocr show"). And in my example above, I am referring to the first and last words of the paragraph. I will just simulate clicks before and after these words, and the OS is responsible for selecting everything in between. I would recommend you just try the package out and look at my blog entry that is linked from the readme with examples to try.