trufanov-nok / scantailor-universal

ScanTailor Universal - a fork based on Enhanced+Featured+Master versions of ST
http://scantailor.org
Other
181 stars 16 forks source link

Adding segmentation as way to save ton of time #120

Open maximka1812 opened 1 year ago

maximka1812 commented 1 year ago

One option is to use Tesseract OCR API, see https://github.com/maximka1812/Segmentation-Demo-Using-Tesseract-API Note that you can use Tesseract API also for automatic page turning on initial stage, they have feature to determinate angle (90, 180, 270).

This can be also all be done via external call and just use JSON or similar way to return information to ST.

Another huge help is to be able to use image regions data for mixed output mode, as presently ST automatic mode frequently make errors or miss stuff.

Another option is to have tool that changes ScanTailor project file adding all information, here saved Finereader project files can be also used, they have regions data in binary (and their segmentation quality is better!)