tberg12 / ocular

Ocular is a state-of-the-art historical OCR system.
GNU General Public License v3.0
250 stars 48 forks source link

Change existingExtractionsPath path construction #1

Closed ryanfb closed 9 years ago

ryanfb commented 9 years ago

With this patch if I specify e.g. -inputPath images/pugna -existingExtractionsPath pre_ex, it will look in pre_ex/line_extract/pugna for existing extractions. It should also fall back to normal extraction if no extractions are found.

I've also made a simple Tesseract line splitter whose output works with this option.

ryanfb commented 9 years ago

Closing due to 82d3177429d4dcf7300aa54398ea98c617024f36.

dhgarrette commented 9 years ago

Thank you so much for this! As you saw, I went ahead and implemented a larger rewrite of the pre-extracted line reading/writing functionality.

The Tesseract code is also great to have, and something we've been talking about. I made some changes to your code, and I'll send you a pull-request on your repo, but would you be interested in incorporating it into the main Ocular repo? Additionally, would it be possible for you to throw together some instructions explaining the minimum necessary requirements needed to get someone to where they can compile the c++ file?

ryanfb commented 9 years ago

Is there more info needed for compiling the C++ file? I believe using pkg-config in the Makefile should handle compiling on Linux/Mac if (lib)tesseract/(lib)leptonica are installed via most package managers.