tesseract-ocr / langdata_lstm

Data used for LSTM model training
Apache License 2.0
115 stars 152 forks source link

rename kur to kur_ara #18

Closed Shreeshrii closed 3 years ago

Shreeshrii commented 5 years ago

@stweil Please review.

BisarOmer commented 4 years ago

@Shreeshrii Kur is correct, does not need change.

stweil commented 4 years ago

@BisarOmer, would kmr be fine?

Shreeshrii commented 4 years ago

Please see https://en.m.wikipedia.org/wiki/Kurdish_languages

kmr is kurmanji written in Latin script

ckb is Sorani written in Arabic script

Tesseract 3 kur was in Arabic script.

Tesseract 4 kur was in Latin script and was renamed to kmr.

BisarOmer commented 4 years ago

@stweil kmr is fine for the Kurmanji dialect which is written in the Latin alphabet and in tessdata should remain as kmr, but for the Kurdish central which is written in the Arabic script. kur or kur_cen will be fine and so far tesseract is not trained for Kurdish central so not available at tessdata. and also on langdata kurd_ara needs change