itwood / tesseract-ocr

Automatically exported from code.google.com/p/tesseract-ocr
Other
0 stars 0 forks source link

sanskrit - devanagari - support multiple orthography #1360

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1.
2.
3.

What is the expected output? What do you see instead?

What version of the product are you using? On what operating system?

Please provide any additional information below.

Suggested for 3.04

Please include the following fonts in the training set to provide support for 
multiple orthographies

http://bombay.indology.info/software/fonts/devanagari/

"Nakula follows the Bombay style of Devanagari, with rounded glyphs and little 
thin/thick variation. Sahadeva is in the Calcutta style, with more angular 
glyphs and greater contrast between thin and thick strokes. The actual shapes 
of some of the glyphs (e.g. initial “a”, retroflex “n”) also differ 
according to the style of the font."

http://www.sanskritweb.net/cakram/

"Chandas font represents Southern (most commonly used today) style of 
Devanagari script. And Uttara font represents Northern style of Devanagari 
Script. These styles are sometimes also called Bombay (Southern, contemporary) 
and Calcutta (Northern, old) pen families accordingly. Uttara is today the only 
Devanagari OTF font which supports Northern variations in simple glyphs and in 
ligatures. "

http://www.svayambhava.org/

Siddhanta fonts and its variations for Calcutta style.

The following font maybe useful for training for recognizing OLD sanskrit books:

http://www.sanskritweb.net/itrans/
http://www.sanskritweb.net/itrans/santipurot.zip

"the historical font Santipur being a replica of a later and also anonymous 
font used in Germany (Leipzig, etc.) for typesetting Stenzler’s and 
Geiger’s Elementarbücher and Cappeller’s Wörterbuch and many other books 
published since the middle of the 19th century."

Original issue reported on code.google.com by shreeshrii on 30 Oct 2014 at 8:47

GoogleCodeExporter commented 9 years ago
sample pages

Original comment by shreeshrii on 17 Nov 2014 at 7:07

Attachments: