Tesseract generates blank html with LZW compressed tif files on LINUX

kareemu3 / tesseract-ocr

Automatically exported from code.google.com/p/tesseract-ocr

Other

0 stars 0 forks source link

What steps will reproduce the problem?
1. Download the tif file
2. Run tesseract Application-Checklist.tif file with command tesseract 
Application-Checklist/Application-Checklist.tif Application-Checklist -l eng 
-psm 4 +hocr.txt

hocr.txt contains tessedit_create_hocr 1

If we run the same file with CCIT T.6 compression, it results to proper OCR 
extracted content in HTML.

What is the expected output? What do you see instead?
Proper OCR extracted content in HTML

What version of the product are you using? On what operating system?
tesseract 3.01

Original issue reported on code.google.com by nareshgo...@gmail.com on 17 Apr 2015 at 8:03

Attachments:

Application-Checklist.tif

kareemu3 / tesseract-ocr

Tesseract generates blank html with LZW compressed tif files on LINUX #1452