Open ogallagher opened 4 years ago
Google Cloud Vision OCR API: returns full strings, and individual words with bounding boxes. The pricing here is pretty good for my development and testing situation:
To add it to my project, here’s the maven pom.xml
dependency entry:
Tesseract OCR Library: this solution is also provided by Google, but its usefulness is much more constrained to high clarity computer-generated text, and only returns the contained text string; not information about location or bounds.
Aspose OCR: an OCR library for Java applications that has pretty good functionality, but with pretty steep pricing.
As of this morning I’ve been able to run one successful test request to the Google Vision api through the Widget
class, sending a small corner of the screen and receiving a short list of text boxes in return.
So far, when testing for Windows OS, the code that reads in the google api client credentials has not been working.
On a mac text recognition now seems to work well, as long as the Google OCR quotas are not exceeded. The moveToWidget
action successfully finds a known widget’s label in the screen and moves the mouse to that widget’s location.
A major part of using the GUI, in addition to recognition of icons/images, is being able to read text labels (optical character recognition, or OCR). I’ll first research different text recognition libraries and look for the one that best suits my needs.