About Uyghur(Uighur) langdata

tesseract-ocr / langdata

Source training data for Tesseract for lots of languages

Apache License 2.0

827 stars 886 forks source link

About Uyghur(Uighur) langdata #68

Open gheyret opened 7 years ago

gheyret commented 7 years ago

Hi, I am native Uyghur. I found some error characters(not Uyghur characters in "uig.training_text"). I fixed the errors. please update.

I make uig.frequent_words_list file. Words sorted by frequency. the "all_character_forms.txt" contains all uyghur chracters and their all forms. It can be added to uig.training_text file.

uig.zip all_character_forms.txt

Shreeshrii commented 6 years ago

@gheyret

Please respond to Question from Ray in tesseract-ocr/langdata#72

Anyone know which digits are needed for the other Arabic languages? kur_ara, pus, uig

Please test with the new traineddata in tessdata/best repo and provide feedback.

@theraysmith Hope you have seen this uig info.

EzimetYusup commented 5 years ago

@theraysmith @Shreeshrii @gheyret already answered that question. any update?