raffaeldantas / tesseract-ocr

Automatically exported from code.google.com/p/tesseract-ocr
Other
1 stars 0 forks source link

Add support for sanskrit transliteration in latin/roman script #1362

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1.
2.
3.

What is the expected output? What do you see instead?

What version of the product are you using? On what operating system?

Please provide any additional information below.

Please see https://code.google.com/r/shreeshrii-langdata/source/browse?name=iast
for source files that could be used for this.

Original issue reported on code.google.com by shreeshrii on 31 Oct 2014 at 7:37

GoogleCodeExporter commented 9 years ago
# IAST=International Alphabet of Sanskrit Transliteration
# 
http://en.wikipedia.org/wiki/International_Alphabet_of_Sanskrit_Transliteration
# http://en.wikipedia.org/wiki/ISO_15919

In fact this would be usable not just for Sanskrit transliteration but also for 
various other Indic languages.

I will upload new files which support additional accents used for 
transliteration in books from 1800s (digitized by Google and available on 
archive.org)

Original comment by shreeshrii on 5 Nov 2014 at 4:52

GoogleCodeExporter commented 9 years ago
Move to github: https://github.com/tesseract-ocr/langdata/pull/4

Original comment by joregan on 13 May 2015 at 5:53