ogallagher / terry

Terry the virtual secreTERRY
0 stars 0 forks source link

Text recognition #13

Open ogallagher opened 4 years ago

ogallagher commented 4 years ago

A major part of using the GUI, in addition to recognition of icons/images, is being able to read text labels (optical character recognition, or OCR). I’ll first research different text recognition libraries and look for the one that best suits my needs.

ogallagher commented 4 years ago

Google Cloud Vision OCR API: returns full strings, and individual words with bounding boxes. The pricing here is pretty good for my development and testing situation:

google_vision_ocr_pricing

To add it to my project, here’s the maven pom.xml dependency entry:

7184AD35-BA20-460F-9063-DC50DA2D77CE

ogallagher commented 4 years ago

Tesseract OCR Library: this solution is also provided by Google, but its usefulness is much more constrained to high clarity computer-generated text, and only returns the contained text string; not information about location or bounds.

ogallagher commented 4 years ago

Aspose OCR: an OCR library for Java applications that has pretty good functionality, but with pretty steep pricing.

ogallagher commented 4 years ago

As of this morning I’ve been able to run one successful test request to the Google Vision api through the Widget class, sending a small corner of the screen and receiving a short list of text boxes in return.

ogallagher commented 4 years ago

So far, when testing for Windows OS, the code that reads in the google api client credentials has not been working.

ogallagher commented 4 years ago

On a mac text recognition now seems to work well, as long as the Google OCR quotas are not exceeded. The moveToWidget action successfully finds a known widget’s label in the screen and moves the mouse to that widget’s location.