tesseract-ocr / langdata

Source training data for Tesseract for lots of languages
Apache License 2.0
826 stars 886 forks source link

Maqqaf recognition #129

Closed yarons closed 5 years ago

yarons commented 6 years ago

https://github.com/tesseract-ocr/langdata/blob/106c9b31bea9d30814fc116cbcb9c267dee7df70/heb/heb.numbers#L13

There are numerous options with - (ב-, כ-, ל-‏) while in Hebrew there are cases where proper Maqqaf is used (ב־3 מתוך 4 מקרים).

With this dataset it will not be identified or tested.

amitdo commented 6 years ago

Duplicate of https://github.com/tesseract-ocr/langdata/issues/82#issuecomment-320507304

amitdo commented 5 years ago

@zdenop, please close this issue.