Unicode for characters in tesseract box file

oliveiracwb / tesseract-ocr

Automatically exported from code.google.com/p/tesseract-ocr

Other

0 stars 0 forks source link

Unicode for characters in tesseract box file #1433

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago

Where do I get the unicode for characters of a language? Tesseract needs me to 
edit my box file with something like this

s 734 494 751 519 0
p 753 486 776 518 0
r 779 494 796 518 0
i 799 494 810 527 0
n 814 494 837 518 0
g 839 485 862 518 0

How do I get this values? I know what the 0 means in each column but what about 
rest? I am trying to make a training file for Bangla language.

Please help.

Original issue reported on code.google.com by m.tawfi...@gmail.com on 11 Mar 2015 at 5:52

GoogleCodeExporter commented 9 years ago

Sorry,I meant 0 in each row.

Original comment by m.tawfi...@gmail.com on 11 Mar 2015 at 5:54

GoogleCodeExporter commented 9 years ago

Original comment by zde...@gmail.com on 12 Mar 2015 at 10:41

Changed state: Invalid