What steps will reproduce the problem?
1. Preform OCR on the attached file "eu-004.tiff" with just "tesseract
eu-004.tiff out hocr"
2. Do the same for the png attached below
3. You should note that the OCR output for tables 6.1 and 6.2 is missing a
bunch of numbers in the single page png. This does not appear to be dependent
on the file format, just that if OCR is preformed on that page alone it doesn't
seem to work right.
What is the expected output? What do you see instead?
the expected output is the output for tables 6.1 and 6.2 in the tiff file, it's
almost perfect in fact. however in the OCR output from the png only basically
just the far right column has anything and even it is missing stuff.
What version of the product are you using? On what operating system?
I'm using version 3.02
Please provide any additional information below.
I have attached the files I mention in this. I'm using OCR to do table
recognition; previously this hadn't been an issue because OCR was being
preformed on the whole thing not one page at a time. Some changes to the
application however changed this fact and this bug arose.
Original issue reported on code.google.com by jake.h.e...@gmail.com on 3 Jun 2013 at 9:06
Original issue reported on code.google.com by
jake.h.e...@gmail.com
on 3 Jun 2013 at 9:06