ruifontes / tesseractOCR

GNU General Public License v2.0
12 stars 8 forks source link

TesseractOCR

Information

This add-on uses the free and open source Tesseract OCR engine, to perform optical character recognition on an image file, PDF, JPG, TIF or other, without the need to open it. The text file will bve placed at same folder with the same name of the original file but with .TXT extension. It also allows access to WIA enabled scanners to perform OCR to a paper document. The results are shown in a file named OCR.txt placed in users Documents folder. At last, it also can get the accessible text from an accessible PDF, using XPDF tools. In the NVDA menu, Preferences, a TesseractOCR section is added, where you can configure the following:

With the exception of English and Portuguese, which are already included in add-on, the other languages will be downloaded and installed when you select a language that does not already exist in the add-on. Note that as the number of selected recognition languages increases, the OCR process will take longer. We therefore recommend that you use only the languages you need. Note also that the quality of recognition may vary according to the order of languages. Therefore, if the recognition result is not satisfactory, you may want to try another language ordering.

Shortcut

The default commands are: Windows+Control+w - to scan and recognize a document through the scanner; Windows+Control+r - to recognize the selected document; Windows+Control+t - To get the text from an accessible PDF; Windows+Control+c - To cancel the scanning process. Please note: It must be issued before the dialog asking if you want to scan more pages appear!

Then just wait the text file appears with the recognized text.

This commands can be modified in the "Input gestures" dialog in the "TesseractOCR" section.

Known problems

Languages supported

The supported languages in this version are:

Image types supported

This add-on supports the following types of files: