mithilesh1125 / tesseract-ocr

Automatically exported from code.google.com/p/tesseract-ocr
Other
0 stars 0 forks source link

crlf / newline recognition #650

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
with tesseract 2.04 the character recognition did preserve the single lines of 
the mrz of an id-card, which was scanned to an image.
since tesseract 3.01 the newlines will be ignored and the output-file is simply 
a stream (1 line) of all recognized chars.
we would prefere to keep the line context ...

Original issue reported on code.google.com by w3b4d...@itsvbuero.de on 14 Mar 2012 at 10:01

GoogleCodeExporter commented 9 years ago
Please provide details more (e.g. OS) and example files.

Original comment by zde...@gmail.com on 15 Mar 2012 at 7:20

GoogleCodeExporter commented 9 years ago
I have found by myself, that the carriage-return is still present. But 
unfortunately not as crlf anymore (like before). I wish, I could configure this 
somehow/somewhere ...

Original comment by w3b4d...@itsvbuero.de on 15 Mar 2012 at 9:38

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Current version of tesseract use binary mode for reading/writing.
With this setting you will get the same output on all platforms (e.g. you not 
need to care if output is from unix/mac/linux). Than means: output is UTF-8 
encoded without byte order mark and lines are separated by "\n".

Original comment by zde...@gmail.com on 15 Mar 2012 at 10:00

GoogleCodeExporter commented 9 years ago
Hi, I have installed the Windows installer for Tesseract 3.02 
[tesseract-ocr-setup-3.02.02.exe] and OCR the sample file with the command 
"Tesseract phototest.png phototest". The output file [phototest.txt] generates 
as a single line and the newline characters cannot be recognized properly. 

I am seeing it both in Windows Vista and WinXP SP3.

I am hereby attaching the PNG as well as the output file.

Any help will be highly appreciated.  

Original comment by sinhasur...@gmail.com on 2 Jul 2014 at 1:07

Attachments: