Textractor is an OCR application for Sailfish OS. Main features:
OCR can be run on:
Cropping is supported in any reasonable quadrilateral arrangement and perspective correction is applied for the selection. User has access to advanced image preprocessing settings.
Found text can be edited or copied to clipboard. As SFOS is a true multitasking OS, the whole OCR process can be run on background while user can use the device for other purposes at the same time.
To be able to build this, follow this Gist to setup the environment correctly: https://gist.github.com/skvark/49a2f1904192b6db311a
In short:
Add my repositories containing Tesseract OCR and Leptonica to the build machine targets.
Tesseract OCR is just plain engine so Leptonica is used for preprocessing the image.
Currently following steps will be done before the image is passed to the engine for recognition:
After those steps the image is passed to the Tesseract.
Original:
Preprocessed
Extracted text:
This is a lot of 12 point text to test the
ocr code and see if it works on all types
of file format.
The quick brown dog jumped over the
lazy fox. The quick brown dog jumped
over the lazy fox. The quick brown dog
jumped over the lazy fox. The quick
brown dog jumped over the lazy fox.
D R I N K COFFEE
L Do Stupid Faster
With More Energy