kareemu3 / tesseract-ocr

Automatically exported from code.google.com/p/tesseract-ocr
Other
0 stars 0 forks source link

Tesseract generates blank html with LZW compressed tif files on LINUX #1452

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Download the tif file
2. Run tesseract Application-Checklist.tif file with command tesseract 
Application-Checklist/Application-Checklist.tif Application-Checklist -l eng 
-psm 4 +hocr.txt

hocr.txt contains tessedit_create_hocr 1

If we run the same file with CCIT T.6 compression, it results to proper OCR 
extracted content in HTML.

What is the expected output? What do you see instead?
Proper OCR extracted content in HTML

What version of the product are you using? On what operating system?
tesseract 3.01

Original issue reported on code.google.com by nareshgo...@gmail.com on 17 Apr 2015 at 8:03

Attachments:

GoogleCodeExporter commented 9 years ago
It works for me with code from repository and recent version of leptonica.
tesseract 3.01 is very old version, but anyway I guess could be leptonica (or 
its dependency libtiff)...

Original comment by zde...@gmail.com on 17 Apr 2015 at 6:13

Attachments: