raffaeldantas / tesseract-ocr

Automatically exported from code.google.com/p/tesseract-ocr
Other
1 stars 0 forks source link

text2image fails in generating box-tiff pairs. #1389

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?

1. Make sure Lohit Devanagari font is installed. 
2. Use this command on terminal on the attached files (2.txt, 3.txt):
text2image --text=2.txt --outputbase=hin.LohitDevanagari.exp2 --font='Lohit 
Devanagari' --fonts_dir=/usr/share/fonts
3. The program crashes.

What is the expected output? What do you see instead?

The box-tiff files must have generated though it fails to do so. Instead this 
error is produced. 

Initializing fontconfig
cluster_text.size() == start_byte_to_box.size():Error:Assert failed:in file 
stringrenderer.cpp, line 467
Segmentation fault (core dumped)

Text2Image works fine on other files (Please find attached 4.txt, 5.txt)

What version of the product are you using? On what operating system?
Tesseract 3.02

Please provide any additional information below.

Original issue reported on code.google.com by adityaku...@gmail.com on 8 Dec 2014 at 2:54

Attachments:

GoogleCodeExporter commented 9 years ago
Problem is that your input files use windows end-of-line (\r\n) and tesseract 
expect unix like end-of-line (\n). While it is not and problem within text 
(4.txt, 5.txt) it cause problem where first line is empty line (files 3.txt, 
2.txt).

So you can:
1. remove first empty line
2. use unix like end-of-line (recommended)

You can use util dos2unix to convert end of line or some advanced editors (e.g. 
Notepad++ on windows) for this task.

Original comment by zde...@gmail.com on 1 May 2015 at 12:56