tesseract-ocr / langdata_lstm

Data used for LSTM model training
Apache License 2.0
115 stars 152 forks source link

Add Shan language data #46

Open ronaldaug opened 3 years ago

ronaldaug commented 3 years ago

Add Shan language data

Shan language = https://en.wikipedia.org/wiki/Shan_language Language code = shn Shan Wiki = https://shn.wikipedia.org Sample websites that are using Shan language = https://taifreedom.com/ ,https://shannews.org/ , http://shanunicode.com/ Font = https://github.com/kwarm/kwarm-assets/blob/master/panglong.ttf

Reference issue

stweil commented 3 years ago

@ronaldaug, your commits currently show Author: ronaldaug. Is that intended, or should I change that to your real name when merging this pull request? Note that you can not only use Latin characters for your name but also any other UTF-8 characters.

stweil commented 3 years ago

Still missing: shn/okfonts.txt. Which fonts can be used for rendering the Shan language texts? The link above for panglong.ttf does not work. A font PangLong is available from https://saosu-mp.github.io/font/PangLong/PangLong.ttf. Is that the only font for Shan? If yes, I can add the missing file when merging the pull request.