raffaeldantas / tesseract-ocr

Automatically exported from code.google.com/p/tesseract-ocr
Other
1 stars 0 forks source link

Tesseract does not generate correct number of characters in box file #1441

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1.Create a text file with 21 characters of Bengali
2.Make a tif image
3.Produce a box file from that tif image

What is the expected output? What do you see instead?

Since there are 21 Bengali characters, tesseract should generate 21 characters 
with coordinates which I can edit.But apparently it does not recognize Bengali 
characters, so it produced 9 characters.

Please use labels and text to provide additional information.
I am using Windows XP SP3, tesseract 3.02.02 .Gave all the three files to look 
at.One important note, I could never  make tesseract work in WIndows7 or 8

Original issue reported on code.google.com by m.tawfi...@gmail.com on 4 Apr 2015 at 4:59

Attachments:

GoogleCodeExporter commented 8 years ago
Since you may not recognize Bangla characters, I list here each of the 
characters separately.

Original comment by m.tawfi...@gmail.com on 4 Apr 2015 at 5:06

Attachments:

GoogleCodeExporter commented 8 years ago
Someone please help.

Original comment by m.tawfi...@gmail.com on 8 Apr 2015 at 12:51

GoogleCodeExporter commented 8 years ago
You will need to edit the box file before continuing with the remaining steps 
of the training.

Original comment by nguyen...@gmail.com on 19 Apr 2015 at 8:12

GoogleCodeExporter commented 8 years ago
Yes, but how? For example, my input was something like "My Hobby is 
Gardening".Here, we have 18 characters which are M y H o b b y i s G a r d e n 
i n g, so if tesseract say generates 12 random characters instead of 18, how 
can I edit it? In case of English, tesseract does not produce wrong character 
but I am working on Bengali.  

Original comment by m.tawfi...@gmail.com on 23 Apr 2015 at 3:22

GoogleCodeExporter commented 8 years ago

Original comment by zde...@gmail.com on 27 Apr 2015 at 6:46