Open GoogleCodeExporter opened 9 years ago
[deleted comment]
[deleted comment]
[deleted comment]
[deleted comment]
http://research.ijcaonline.org/volume39/number6/pxc3877076.pdf
Shirorekha Chopping Integrated Tesseract OCR
Engine for Enhanced Hindi Language Recognition
Is this approach already included in 3.02 version?
Original comment by shreeshrii
on 19 Mar 2013 at 4:25
http://eutypon.gr/eutypon/pdf/e2012-29/e29-a01.pdf
Training Tesseract for Ancient Greek OCR
useful info for training
Original comment by shreeshrii
on 20 Mar 2013 at 6:12
[deleted comment]
[deleted comment]
[deleted comment]
[deleted comment]
[deleted comment]
OCR results are better when using text with words rather than just
letter-combinations. See attached image file and corrected box file.
Here are some of the errors and changes that had to be made in the box file:
जा 835 3147 869 3170 0 - आ
ध 1851 3147 1871 3170 0 - घ
या 215 3059 242 3082 0 - षा
जा 1584 2971 1617 2994 0 - आ
फा 2063 2971 2095 2994 0 - फ़ा
ज 2148 2971 2171 2994 0 - ज़
श्या 1156 2907 1174 2917 0 - maatraa from next line - deleted
५ 1244 2907 1257 2916 0 - maatraa from next line - deleted
पु 100 2875 133 2906 0 - सु - box size will require change
त 132 2883 148 2906 0 - ख - box size will require change
दु 158 2875 183 2906 0 - दुः - box size will change
र 186 2883 200 2906 0
व 196 2883 212 2906 0 - ख - combine with line above
...
Original comment by shreeshrii
on 26 Mar 2013 at 6:13
Attachments:
I tried training for hindi using sanskrit2003 font. However, when using the
generated hin.traineddata tesseract crashes with cube error.
box-tiff pairs files are attached.
Traning files are too big to attach here.
Original comment by shreeshrii
on 5 Apr 2013 at 1:21
Attachments:
I tried to use the lohit font box/tif pairs provided in parichit project for
Hindi.
The files had to be renamed hin.lohit.exp0.tif and .box instead of hin.lohit.tif
otherwise there was font-id error related to font_properties file.
Once that hurdle was passed, the files failed with the following error during
shapeclustering.
4272: Distance = 0.024631: Distance = 0.024896: Stopped with 88 merged, min
dist 0.025000
Master shape_table:Number of shapes = 2083 max unichars = 11 number with
multiple unichars
Read shape table shapetable of 2083 shapes
Reading traindata\san.lohit.exp000.tr ...
Clustering error: Matrix inverse failed with error 1.80869
Clustering error: Matrix inverse failed with error 3.86638
Done!
Reading traindata\san.lohit.exp000.tr ...
Original comment by shreeshrii
on 15 Apr 2013 at 12:00
[deleted comment]
[deleted comment]
Generalization of Hindi OCR Using Adaptive
Segmentation and Font Files
Mudit Agrawal, Huanfeng Ma, and David Doermann
http://lampsrv02.umiacs.umd.edu/pubs/Papers/muditagrawal-09/muditagrawal-09.pdf
Original comment by shreeshrii
on 4 Oct 2014 at 9:45
Issue 1425 has been merged into this issue.
Original comment by zde...@gmail.com
on 22 Feb 2015 at 9:35
Original issue reported on code.google.com by
shreeshrii
on 15 Mar 2013 at 8:29