jwilk-archive / ocrodjvu

OCR for DjVu
GNU General Public License v2.0
45 stars 19 forks source link

ocrodjvu for tesseract 3.04.00 #14

Closed jwilk closed 6 years ago

jwilk commented 9 years ago

Issue reported by @jsbien:

Tesseract 3.04.00 allows a new parameter: hocr_font_info. I think it should be switched on.

I've patched ocrodjvu and made a simple test. It seems to work OK, but I get the message "Warning in pixReadMemTiff: tiff page 1 not found" twice for every processed page.

The explanation provided on the list

https://groups.google.com/forum/#!topic/tesseract-ocr/Yl58Bn0N168

doesn't seem relevant for this case.

jwilk commented 9 years ago

Where can I get this version of Tesseract?

I don't see any link to it on https://code.google.com/p/tesseract-ocr/ .

jwilk commented 9 years ago

Comment submitted by @jsbien:

I checked it out following the instructions https://code.google.com/p/tesseract-ocr/source/checkout.

This is a development version, but the oficial release is expected very soon now.

jwilk commented 9 years ago

Comment submitted by @jsbien:

Forgot to mention that sample results including the log are available at http://teksty.klf.uw.edu.pl/7/.

jwilk commented 9 years ago

I think I'll wait with any action until 3.04 is officially released.

jsbien commented 6 years ago

I think it's time to close this issue, my problem is better solved by issue #26. On the other hand, Tesseract is now at version 4.0...

jwilk commented 6 years ago

Agreed; let's close it.